GpuKernel_sched will return LS/GS values that do not satisfy LS*GS => N #438

HapeMask · 2017-05-18T04:35:31Z

The documentation for GpuKernel_sched states that it may return LS*GS > N and your code should be able to handle that (which is fine), but I found that it's actually returning values where LS*GS < N.

Is this intended? For a specific example, calling GpuKernel_sched() w/N=273280 on a Titan X (target_g=768, target_l=512) returns LS=352 GS=768, LS*GS = 270336.

It looks like the function tries to make sure LS*GS >= N here:

libgpuarray/src/gpuarray_kernel.c

Line 80 in 5db51f9

*ls = ((n / min_l) / *gs) * min_l;

but the code doesn't do that in this case.

The text was updated successfully, but these errors were encountered:

nouiz · 2017-05-18T12:39:30Z

First, it is impossible that in all cases LS*GS will always be greater then N for big N, due to hardware/cuda limitation. So in some cases, it won't be higher or equal to N. So your code should be able to handle that too (or detect it and raise an error).

Maybe we can modify that function to be better and do it in more cases. I let @abergeron check that in more detail. But it will probably wait to next week, as he should be back today and there is a bunch of stuff that accumulated.

abergeron · 2017-05-18T18:27:43Z

Yes, that isn't explicitly mentioned and the documentation could be improved on that, but your code should handle cases where LS*GS doesn't cover the whole N by looping.

I'm leaving this open to remind me to update the doc.

HapeMask · 2017-05-18T19:46:29Z

Ah I see, that makes sense. Thanks for clearing things up.

wonghang mentioned this issue Feb 8, 2020

Fix for tril, triu when kernel_sched returns LS, GS such that LS*GS < N #589

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GpuKernel_sched will return LS/GS values that do not satisfy LS*GS => N #438

GpuKernel_sched will return LS/GS values that do not satisfy LS*GS => N #438

HapeMask commented May 18, 2017

nouiz commented May 18, 2017

abergeron commented May 18, 2017

HapeMask commented May 18, 2017

GpuKernel_sched will return LS/GS values that do not satisfy LS*GS => N #438

GpuKernel_sched will return LS/GS values that do not satisfy LS*GS => N #438

Comments

HapeMask commented May 18, 2017

nouiz commented May 18, 2017

abergeron commented May 18, 2017

HapeMask commented May 18, 2017