Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dealing with highly non-homogeneous systems #74

Open
lmiq opened this issue Oct 3, 2022 · 6 comments
Open

dealing with highly non-homogeneous systems #74

lmiq opened this issue Oct 3, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@lmiq
Copy link
Member

lmiq commented Oct 3, 2022

When systems are highly non-homogeneous, the number of cells can explode to be much greater than the number of particles:

using StaticArrays
using CellListMap

function points_in_sphere(r,N)
    p = SVector{3,Float64}[]
    for ip in 1:N
        θ = π*rand()
        ϕ = 2π*rand()
        x = r * sin(ϕ) * cos(θ)
        y = r * sin(ϕ) * sin(θ) 
        z = r * cos(ϕ)
        push!(p, SVector(x,y,z))
    end
    return p
end
julia> p = points_in_sphere(500, 100);

julia> box = Box(limits(p), 0.02)
Box{NonPeriodicCell, 3}
  unit cell matrix = [ 990.57, 0.0, 0.0; 0.0, 985.31, 0.0; 0.0, 0.0, 999.88 ]
  cutoff = 0.02
  number of computing cells on each dimension = [49530, 49267, 49995]
  computing cell sizes = [0.02, 0.02, 0.02] (lcell: 1)
  Total number of cells = 121997524527450

This makes it prohibitive to work with this kind of system. Can we find a way to not allocate empty cells at all? The full cell allocation would need to be lazy, since the construction of the box does not involve the actual coordinates of the system.

@lmiq lmiq added the enhancement New feature or request label Oct 3, 2022
@BioTurboNick
Copy link

I've been working with a custom cell list (though not implemented as a linked list) that uses a SparseArray to store indexes into a vector of cells, and only occupied cells are created.

In case that's useful as an inspiration.

@lmiq
Copy link
Member Author

lmiq commented Oct 3, 2022

Any inspiration is welcome 😁

@lmiq
Copy link
Member Author

lmiq commented Oct 3, 2022

For reference. The array that depends critically on the number of cells is this one:

" Auxiliary array that contains the indexes in list of the cells with particles, real or images. "

    " Auxiliary array that contains the indexes in list of the cells with particles, real or images. "
    cell_indices::Vector{Int} = zeros(Int,number_of_cells)

It is a linked list array indicating which are the indexes of the cells that contain particles.

The problem of removing this array (and storing, for instance, a simple lists of the indexes of cells with particles), is that we need to make the association in the reverse direction. That is, we need to quickly retrieve the data of a cell with particles just from the (linearized) three-dimensional indexes of the cell, to be able to run over vicinal cells in the inner loops.

An alternative (maybe a good one, actually), is to store the indexes of the vicinal cells as a 26 (in 3D) or 8 (in 2D) array, for each cell containing particles. Each cell struct would then carry an additional field, but the storage requirement would be at most 27*N if each cell contains a single particle.

@lmiq
Copy link
Member Author

lmiq commented Oct 7, 2022

New idea (cannot test now):

Define a new type of array to store cell_indices, with a buffer size equal to the number of particles. Define getindex such that it returns 0 if the cell is not defined, and the indices of the cell if the cell was set, as occurs with the current cell_indices array. If the number of cells is less than the number of particles, the cells can be indexed as usual. If the number of cells is greater than the number of particles, fill the array in order, to access the indexes with searshortedfirst afterwards.

A proper implementation of that may allow just substituting the type of array cell_indices is without modifying anything in the remaining code. The overhead associated with using searchsortedfirst may not be negligible, but it is not that bad, and probably pays off relative to the current approach, when the number of cells is much greater than the number of particles.

@BioTurboNick
Copy link

BioTurboNick commented Oct 7, 2022

What you're describing is similar conceptually to how the SparseMatrixCSC method I described works. (Internally it selects the column and does searchsortedfirst over it to get an element.)

The big problem I had with SparseMatrixCSC was that adding individual items is very slow, so I had to pre-allocate the I, J, and V arrays.

@lmiq
Copy link
Member Author

lmiq commented Oct 7, 2022

Yes, effectively. I think that can work, except that I need a bit more customization as I will need the array type to carry some information about the number of elements effectively used in the array vs. the number of elements of the buffer, to avoid having to resize the arrays too often in the case of list updating. But something on those lines can work, yes, your comment was important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants