cuSolver may prominently enhance the efficiency of LCAO module #715
haxushu
started this conversation in
Show and tell
Replies: 1 comment
-
The report is impressive, thanks @haxushu |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background
According to our ABACUS LCAO profiling, the cost of solving generalized eigenvalue problems plays a dominant role as the scale of the structure is large.
For example, when considering 512 Si atoms each 4x4x4 supercell and using 8 processes, the overhead of ELPA is as follows:
Consequently, to boost the solving procedure with LCAO module in abacus, a more efficient eigensolver will be beneficial.
Describe the solution you'd like
CuSolver may be the best policy. The conclusion is based on our report Eigensolver Benchmark where the performance of eigenvector APIs from ELPA and cuSolver is benchmarked against different criterion. Focusing on GPU accelerating situation, the overhead(in seconds) with respect to solving partial or all eigenvectors is recorded with 1 processes(1 OMP thread), nblk=32 and one V100 GPU.
*:18s when tuning nblk=512
^:28s when tuning nblk=128
As is vividly shown above, even though partial eigenvectors need to compute, cuSolver, which computes all by default, exhibits a much more satisfactory performance than elpa.
Additional context
We(i.e. ByteDance) plan to divide this cuSolver realization into three steps:
Step 1: Support a single GPU accelerating. (We are now at this step.)
Step 2: Support a single node multiGPU accelerating.
Step 3: Support multi-nodes multiGPU accelerating.
At the stage of Step 1, it is a possible strategy that a single GPU would first gather from all the processes to form the whole H and S matrix. After calling cuSolver API cusolverDnDsygvd, the outcome would finally be scattered. Depending on cuSolverMG APIs which have not matured, Step 2 and 3 would start in a proper time。
Beta Was this translation helpful? Give feedback.
All reactions