Skip to content

Notes on Performance

Aditya Atluri edited this page Feb 20, 2018 · 5 revisions

Experiment 1

When a 16kB buffer is transferred from one gpu to other, doing copy as dwordx4 gave 1.59GBps (over 128 kernel launches with 1024 work items) whereas, dword gave 1.45 GBps.

Clone this wiki locally