You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation for GpuKernel_sched states that it may return LS*GS > N and your code should be able to handle that (which is fine), but I found that it's actually returning values where LS*GS < N.
Is this intended? For a specific example, calling GpuKernel_sched() w/N=273280 on a Titan X (target_g=768, target_l=512) returns LS=352 GS=768, LS*GS = 270336.
It looks like the function tries to make sure LS*GS >= N here:
First, it is impossible that in all cases LS*GS will always be greater then N for big N, due to hardware/cuda limitation. So in some cases, it won't be higher or equal to N. So your code should be able to handle that too (or detect it and raise an error).
Maybe we can modify that function to be better and do it in more cases. I let @abergeron check that in more detail. But it will probably wait to next week, as he should be back today and there is a bunch of stuff that accumulated.
Yes, that isn't explicitly mentioned and the documentation could be improved on that, but your code should handle cases where LS*GS doesn't cover the whole N by looping.
I'm leaving this open to remind me to update the doc.
The documentation for
GpuKernel_sched
states that it may return LS*GS > N and your code should be able to handle that (which is fine), but I found that it's actually returning values where LS*GS < N.Is this intended? For a specific example, calling GpuKernel_sched() w/N=273280 on a Titan X (target_g=768, target_l=512) returns LS=352 GS=768, LS*GS = 270336.
It looks like the function tries to make sure LS*GS >= N here:
libgpuarray/src/gpuarray_kernel.c
Line 80 in 5db51f9
but the code doesn't do that in this case.
The text was updated successfully, but these errors were encountered: