-
Notifications
You must be signed in to change notification settings - Fork 127
Notes on Performance
Aditya Atluri edited this page Apr 11, 2018
·
5 revisions
This notes have data regarding different experiments conducted in understanding to how to get peak p2p bandwidth.
The machine under testing has 5.26GBps
of uni-directional bandwidth and 10.128GBps
of uni-directional bandwidth
When a 16kB
buffer is transferred from one gpu to other, doing copy as dwordx4
gave 1.59GBps (over 128 kernel launches with 1024 work items) whereas, dword
gave 1.45 GBps.
The lowest size to get peak bandwidth for copy kernel is 4MB. The kernel should be launched with 1024 work items each doing a dwordx4 mov
The write to a peer gpu is 2x faster than reading from a peer gpu.