Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GPU support #15

Open
maltekuehl opened this issue Dec 10, 2024 · 3 comments
Open

Add GPU support #15

maltekuehl opened this issue Dec 10, 2024 · 3 comments

Comments

@maltekuehl
Copy link
Collaborator

Currently, no GPU acceleration is available, limiting scalability to large datasets. cuml, cupy and cupyx provide functionality, that should allow GPU support to be added.

@niklasmueboe
Copy link
Collaborator

niklasmueboe commented Dec 10, 2024

That's a great idea.

There is a corresponding function for scipy.sparse.linalg.eigsh in cupyx (cupyx.scipy.sparse.linalg.eigsh) which offers almost all needed features.

Unfortunately the corresponding function for scipy.linalg.eigh is missing. There is something that corresponds to numpy.linalg.eigh in cupy (cupy.linalg.eigh) which is a bit less feature complete, but it should be possible to work around this I hope.

@maltekuehl
Copy link
Collaborator Author

maltekuehl commented Dec 13, 2024

I have also made a mapping between what is currently used and what is available for use with the GPU:

SciPy/Numpy CuPy
scipy.sparse.csc_array cupy.array (csr_array not used sparsely in the code)
scipy.sparse.csc_matrix cupyx.scipy.sparse.csc_matrix
scipy.sparse.csr_array cupy.array (csr_array not used sparsely in the code)
scipy.sparse.csr_matrix cupyx.scipy.sparse.csr_matrix
scipy.sparse.eye cupyx.scipy.sparse.eye
scipy.sparse.issparse cupyx.scipy.sparse.issparse
scipy.sparse.linalg.eigsh cupyx.scipy.sparse.linalg.eigsh
scipy.linalg.eigh cupy.linalg.eigh
np.ndarray cupy.ndarray
np.number - (needed? Only used for typing)
np.float64 cupy.float64
np.sort cupy.sort
np.abs cupy.abs
np.flip cupy.flip
np.flipud cupy.flipud
np.sum cupy.sum
np.mean cupy.mean

For implementation, the question now is whether we want to always import the CPU packages and just import the GPU libraries (if available and the use_gpu flag is not false) or whether the scipy packages should also only be imported for type checking and when the CPU will be used.

For cupy, we can likely use self.xp that we either set to np or cp in the __init__. However, this will likely require the check if self.xp.__name__ == "cupy": array = array.get() in some places to keep the output from the methods usable with CPU packages. This may introduce a small overhead for GPU use but would probably be the better choice to provide a consistent CPU output for all users.

We could also perform the preprocessing on the GPU thanks to cuml.preprocessing.normalize.

For users interested in GPU usage, the RAPIDS installation guide should probably also be referenced in the documentation and perhaps even in an error message (I would suggest raising ImportError when use_gpu is explicitly set to True or in a warning, which we could consider raising when use_gpu is None but not False).

Please let me know how I can help further with this implementation.

Best,
Malte

@niklasmueboe
Copy link
Collaborator

niklasmueboe commented Dec 17, 2024

FYI, this would be the issue tracking the missing functionality (i.e. features from for scipy.linalg.eigh that are not available in cupy)
cupy/cupy#7901

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants