CUDA implementation of the Blocked Floyd-Warshall All pairs shortest path graph algorithm based on article: "A Multi-Stage CUDA Kernel for Floyd-Warshall" (Ben Lund, Justin W. Smith)
Hardware (MSI GP72 7RE):
|
Environment:
|
|V| | |E| | fw.cpp | fw-cuda.cu | Speedup | blocked-fw-cuda.cu | Speedup |
---|---|---|---|---|---|---|
1000 | 150000 | 0.724s | 0.176s | 4.215x | 0.117s | 6.188x |
2000 | 400000 | 5.895s | 0.642s | 9.182x | 0.213s | 27.67x |
5000 | 500000 | 84.91s | 8.406s | 10.10x | 2.301s | 36.90x |
7500 | 1125000 | 279.0s | 28.31s | 9.855x | 7.550s | 36.95x |
10000 | 2000000 | 665.9s | 63.81s | 10.43x | 17.28s | 38.53x |
12500 | 3125000 | 1269s | 125.5s | 10.11x | 34.03s | 37.27x |
- Install make/nvcc
- Update makefile with path to nvcc compiler
- Run command: make all
- Install Python 3.6
- Go to project directory
- Run command: python3 -m unittest discover -s test -v
Author: Mateusz Bojanowski