dealing with highly non-homogeneous systems #74

lmiq · 2022-10-03T12:38:19Z

When systems are highly non-homogeneous, the number of cells can explode to be much greater than the number of particles:

using StaticArrays
using CellListMap

function points_in_sphere(r,N)
    p = SVector{3,Float64}[]
    for ip in 1:N
        θ = π*rand()
        ϕ = 2π*rand()
        x = r * sin(ϕ) * cos(θ)
        y = r * sin(ϕ) * sin(θ) 
        z = r * cos(ϕ)
        push!(p, SVector(x,y,z))
    end
    return p
end

julia> p = points_in_sphere(500, 100);

julia> box = Box(limits(p), 0.02)
Box{NonPeriodicCell, 3}
  unit cell matrix = [ 990.57, 0.0, 0.0; 0.0, 985.31, 0.0; 0.0, 0.0, 999.88 ]
  cutoff = 0.02
  number of computing cells on each dimension = [49530, 49267, 49995]
  computing cell sizes = [0.02, 0.02, 0.02] (lcell: 1)
  Total number of cells = 121997524527450

This makes it prohibitive to work with this kind of system. Can we find a way to not allocate empty cells at all? The full cell allocation would need to be lazy, since the construction of the box does not involve the actual coordinates of the system.

BioTurboNick · 2022-10-03T16:02:42Z

I've been working with a custom cell list (though not implemented as a linked list) that uses a SparseArray to store indexes into a vector of cells, and only occupied cells are created.

In case that's useful as an inspiration.

lmiq · 2022-10-03T16:04:45Z

Any inspiration is welcome 😁

lmiq · 2022-10-03T19:03:59Z

For reference. The array that depends critically on the number of cells is this one:

CellListMap.jl/src/CellLists.jl

Line 149 in f68d284

    
           " Auxiliary array that contains the indexes in list of the cells with particles, real or images. "

    " Auxiliary array that contains the indexes in list of the cells with particles, real or images. "
    cell_indices::Vector{Int} = zeros(Int,number_of_cells)

It is a linked list array indicating which are the indexes of the cells that contain particles.

The problem of removing this array (and storing, for instance, a simple lists of the indexes of cells with particles), is that we need to make the association in the reverse direction. That is, we need to quickly retrieve the data of a cell with particles just from the (linearized) three-dimensional indexes of the cell, to be able to run over vicinal cells in the inner loops.

An alternative (maybe a good one, actually), is to store the indexes of the vicinal cells as a 26 (in 3D) or 8 (in 2D) array, for each cell containing particles. Each cell struct would then carry an additional field, but the storage requirement would be at most 27*N if each cell contains a single particle.

lmiq · 2022-10-07T13:21:49Z

New idea (cannot test now):

Define a new type of array to store cell_indices, with a buffer size equal to the number of particles. Define getindex such that it returns 0 if the cell is not defined, and the indices of the cell if the cell was set, as occurs with the current cell_indices array. If the number of cells is less than the number of particles, the cells can be indexed as usual. If the number of cells is greater than the number of particles, fill the array in order, to access the indexes with searshortedfirst afterwards.

A proper implementation of that may allow just substituting the type of array cell_indices is without modifying anything in the remaining code. The overhead associated with using searchsortedfirst may not be negligible, but it is not that bad, and probably pays off relative to the current approach, when the number of cells is much greater than the number of particles.

BioTurboNick · 2022-10-07T13:52:49Z

What you're describing is similar conceptually to how the SparseMatrixCSC method I described works. (Internally it selects the column and does searchsortedfirst over it to get an element.)

The big problem I had with SparseMatrixCSC was that adding individual items is very slow, so I had to pre-allocate the I, J, and V arrays.

lmiq · 2022-10-07T14:06:51Z

Yes, effectively. I think that can work, except that I need a bit more customization as I will need the array type to carry some information about the number of elements effectively used in the array vs. the number of elements of the buffer, to avoid having to resize the arrays too often in the case of list updating. But something on those lines can work, yes, your comment was important.

lmiq added the enhancement New feature or request label Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dealing with highly non-homogeneous systems #74

dealing with highly non-homogeneous systems #74

lmiq commented Oct 3, 2022 •

edited

Loading

BioTurboNick commented Oct 3, 2022

lmiq commented Oct 3, 2022

lmiq commented Oct 3, 2022

lmiq commented Oct 7, 2022

BioTurboNick commented Oct 7, 2022 •

edited

Loading

lmiq commented Oct 7, 2022

dealing with highly non-homogeneous systems #74

dealing with highly non-homogeneous systems #74

Comments

lmiq commented Oct 3, 2022 • edited Loading

BioTurboNick commented Oct 3, 2022

lmiq commented Oct 3, 2022

lmiq commented Oct 3, 2022

lmiq commented Oct 7, 2022

BioTurboNick commented Oct 7, 2022 • edited Loading

lmiq commented Oct 7, 2022

lmiq commented Oct 3, 2022 •

edited

Loading

BioTurboNick commented Oct 7, 2022 •

edited

Loading