You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Notice that now we allocate packing buffer in a dynamic fashion inside our framework. The overhead seems ok but it may be good to preallocate the buffer with an initialized function such hmlp_init().
The reason that this is valuable is because there are some other temporary buffers we may need.
GSKS and GSKNN:
when k > KC, we need to store the rank-KC update.
GKMM:
when k > KC and TV != TC, we need to store the rank-KC update in type TV.
GKRM:
when k > KC and TV != TC, we need to store the rank-KC update in type TV.
GKRM on GPU:
we need an m-by-n/4 buffer in type TC to perform global reduction.
several references: (thanks Jianyu for providing the links)
Notice that now we allocate packing buffer in a dynamic fashion inside our framework. The overhead seems ok but it may be good to preallocate the buffer with an initialized function such hmlp_init().
The reason that this is valuable is because there are some other temporary buffers we may need.
GSKS and GSKNN:
when k > KC, we need to store the rank-KC update.
GKMM:
when k > KC and TV != TC, we need to store the rank-KC update in type TV.
GKRM:
when k > KC and TV != TC, we need to store the rank-KC update in type TV.
GKRM on GPU:
we need an m-by-n/4 buffer in type TC to perform global reduction.
several references: (thanks Jianyu for providing the links)
http://stackoverflow.com/questions/29410064/aligned-dynamic-array-and-smart-pointer
http://www.codeproject.com/Articles/392576/Ideas-from-a-smart-pointer-Support-data-alignment
The text was updated successfully, but these errors were encountered: