CuAlgo is a Python library benefiting from GPU-accelerated computing, featuring a collection of fundamental algorithms implemented with CUDA. Currently, it includes the Floyd-Warshall algorithm for graph analysis, showcasing the potential of GPU acceleration.
- Floyd-Warshall algorithm
- Significant Speedup: Experience substantial performance gains with CuAlgo's GPU-accelerated algorithms compared to their CPU approaches.
- User-Friendly Python Interface: CuAlgo provides convenient interface for Python users. It is compatible with NumPy, allowing for easy data interchange with existing scientific computing workflows. Ensuring that python developers can leverage GPU acceleration without delving into CUDA programming details.
- Cross-Platform Compatibility: Developed with CMake, CuAlgo supports cross-platform development, enabling seamless compilation on various operating systems.
Explore the Floyd-Warshall implementation using different datasets of sizes N=40, N=1000, and N=2000. This section presents a comprehensive analysis of the efficiency improvements achieved through GPU acceleration.
- CPU Version: The algorithm is executed on the CPU without GPU acceleration.
- CPU (12 threads) Version: Runs on the CPU with 12 threads using OpenMP.
- GPU (Unoptimized) Version: Initial GPU implementation with similar parallelism as the next GPU (Optimized) Version.
- GPU (Optimized) Version: GPU implementation with optimizations, including loop/block unrolling, dynamic parallelism, and coalesced memory access, fully leveraging GPU resources efficiently.
The charts illustrate the speedup achieved by CuAlgo's GPU-accelerated algorithms over CPU-based implementations. Notably, the optimized GPU version outperforms both the unoptimized GPU and CPU versions when N grows large, emphasizing the impact of optimization on algorithm efficiency.
CPU | AMD Ryzen 9 5900X 12-Core Processor |
GPU | NVIDIA GeForce RTX 3060 Ti - 8GB |
RAM | 32GB DDR4 3600 Mhz |
CUDA Toolkit Version | 12.2 |
GPU Driver Version | 537.13 |
(For linux, need GCC compiler with C++ support1, and GNU make)
- Latest NVIDIA GPU driver
- Python 3.7+ with pip available
- Latest CUDA toolkit installed with nvcc compiler. (download here)
NOTE: [Recommended] You can skip 2 and 3. by using conda, see Installation below
conda install cuda -c nvidia
python -m pip install --upgrade pip setuptools
pip install cualgo
- Install NVIDIA latest GPU driver by yourself
python -m pip install --upgrade pip setuptools && pip install cualgo
Support data type of Numpy
.
from cualgo import graph as cg
import numpy as np
graph = np.array([
[0 , 7 , np.inf, 8],
[np.inf, 0 , 5 , np.inf],
[np.inf, np.inf, 0 , 2],
[np.inf, np.inf, np.inf, 0]
], dtype=np.float64)
print(cg.floydwarshall(graph))
# [[0.0, 7.0, 12.0, 8.0], [inf, 0.0, 5.0, 7.0], [inf, inf, 0.0, 2.0], [inf, inf, inf, 0.0]]
Or just simply pass 2D list
in python
from cualgo import graph as cg
INF = 9999
graph = [
[0 , 7 , INF, 8],
[INF, 0 , 5 , INF],
[INF, INF, 0 , 2],
[INF, INF, INF, 0]
]
print(cg.floydwarshall(graph))
# [[0, 7, 12, 8], [9999, 0, 5, 7], [9999, 9999, 0, 2], [9999, 9999, 9999, 0]]
Footnotes
-
GCC works more compatible with CUDA's compiler than clang ↩