Skip to content

A cross-platform Pytnon library for fundamental algorithm with GPU-accelerated computing

License

Notifications You must be signed in to change notification settings

anderson101866/cualgo

Repository files navigation

CuAlgo

CuAlgo is a Python library benefiting from GPU-accelerated computing, featuring a collection of fundamental algorithms implemented with CUDA. Currently, it includes the Floyd-Warshall algorithm for graph analysis, showcasing the potential of GPU acceleration.

PyPI package PyPI - Version Python Versions CuAlgo build

Key Features

Graph Algorithms:

  • Floyd-Warshall algorithm

Why CuAlgo?

  • Significant Speedup: Experience substantial performance gains with CuAlgo's GPU-accelerated algorithms compared to their CPU approaches.
  • User-Friendly Python Interface: CuAlgo provides convenient interface for Python users. It is compatible with NumPy, allowing for easy data interchange with existing scientific computing workflows. Ensuring that python developers can leverage GPU acceleration without delving into CUDA programming details.
  • Cross-Platform Compatibility: Developed with CMake, CuAlgo supports cross-platform development, enabling seamless compilation on various operating systems.

Performance Evaluation

Explore the Floyd-Warshall implementation using different datasets of sizes N=40, N=1000, and N=2000. This section presents a comprehensive analysis of the efficiency improvements achieved through GPU acceleration.

Methodology

  • CPU Version: The algorithm is executed on the CPU without GPU acceleration.
  • CPU (12 threads) Version: Runs on the CPU with 12 threads using OpenMP.
  • GPU (Unoptimized) Version: Initial GPU implementation with similar parallelism as the next GPU (Optimized) Version.
  • GPU (Optimized) Version: GPU implementation with optimizations, including loop/block unrolling, dynamic parallelism, and coalesced memory access, fully leveraging GPU resources efficiently.

The charts illustrate the speedup achieved by CuAlgo's GPU-accelerated algorithms over CPU-based implementations. Notably, the optimized GPU version outperforms both the unoptimized GPU and CPU versions when N grows large, emphasizing the impact of optimization on algorithm efficiency.

Hardware and Software Information:

CPU AMD Ryzen 9 5900X 12-Core Processor
GPU NVIDIA GeForce RTX 3060 Ti - 8GB
RAM 32GB DDR4 3600 Mhz
CUDA Toolkit Version 12.2
GPU Driver Version 537.13

Prerequisites

(For linux, need GCC compiler with C++ support1, and GNU make)

  1. Latest NVIDIA GPU driver
  2. Python 3.7+ with pip available
  3. Latest CUDA toolkit installed with nvcc compiler. (download here)

NOTE: [Recommended] You can skip 2 and 3. by using conda, see Installation below

Installation

Linux / Windows [Recommended]:

conda install cuda -c nvidia
python -m pip install --upgrade pip setuptools
pip install cualgo

Windows (without conda):

  1. Install NVIDIA latest GPU driver by yourself
  2. python -m pip install --upgrade pip setuptools && pip install cualgo

Sample Code

Support data type of Numpy.

from cualgo import graph as cg
import numpy as np
graph = np.array([
    [0     , 7     , np.inf, 8],
    [np.inf, 0     , 5     , np.inf],
    [np.inf, np.inf, 0     , 2],
    [np.inf, np.inf, np.inf, 0]
], dtype=np.float64)
print(cg.floydwarshall(graph))
# [[0.0, 7.0, 12.0, 8.0], [inf, 0.0, 5.0, 7.0], [inf, inf, 0.0, 2.0], [inf, inf, inf, 0.0]]

Or just simply pass 2D list in python

from cualgo import graph as cg
INF = 9999
graph = [
    [0  , 7  , INF, 8],
    [INF, 0  , 5  , INF],
    [INF, INF, 0  , 2],
    [INF, INF, INF, 0]
]
print(cg.floydwarshall(graph))
# [[0, 7, 12, 8], [9999, 0, 5, 7], [9999, 9999, 0, 2], [9999, 9999, 9999, 0]]

Footnotes

  1. GCC works more compatible with CUDA's compiler than clang