The reproduced performance is not exactly the same as in the paper #2

wei-mei · 2023-08-30T06:51:08Z

Hello, I am reading your OSDI accepted article - MGG: Accelerating Graph Neural Networks with Fine-grained Intra-kernel Communication Computation Pipelining on Multi-GPU Platforms.
I am using the git project you provided, but the performance shown in the paper is not achieved, such as Compare with DGL on 8xA100 for GCN (Fig.7a )

dataset	speed up
Reddit_beg_pos	0.598862
enwiki-2013_beg_pos	0.980894
t-2004_beg_pos	2.319232
paper100M_beg_pos	3.729139
ogbn-products_beg_pos	2.551465
ogbn-proteins_beg_pos	0.655375
com-Orkut_beg_pos	5.647636

Test on SXM4 A100*8 80GB, pt-to-pt nvlink's bw = 600GB/sec

How should I adjust some configurations in your git to achieve the performance shown in the paper?

YukeWang96 · 2023-08-30T20:51:30Z

Thanks for your interest.

As we mentioned in our paper evaluation ("Platforms & Tools" paragraph), the major evaluation platform is 8×A100 GPUs (40 GB) and we use AWS P4dn.24xlarge instance for evaluation.
For 8xA100 (80GB) due to the difference in GPU global memory bandwidth (2,039GB/s) compared to A100 (40GB) (1,555GB/s), we believe there will be additional parameter-tuning efforts for A100-80GB to achieve better performance. Some other factors like the type and the number of CPU cores of DGX-A100-80GB versus DGX-A100-40GB would also affect the performance of DGL since they rely on zero-copy access with CPU involvements for fetching remote data on the host.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The reproduced performance is not exactly the same as in the paper #2

The reproduced performance is not exactly the same as in the paper #2

wei-mei commented Aug 30, 2023

YukeWang96 commented Aug 30, 2023

The reproduced performance is not exactly the same as in the paper #2

The reproduced performance is not exactly the same as in the paper #2

Comments

wei-mei commented Aug 30, 2023

YukeWang96 commented Aug 30, 2023