Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GpuKernel_sched will return LS/GS values that do not satisfy LS*GS => N #438

Open
HapeMask opened this issue May 18, 2017 · 3 comments
Open

Comments

@HapeMask
Copy link

The documentation for GpuKernel_sched states that it may return LS*GS > N and your code should be able to handle that (which is fine), but I found that it's actually returning values where LS*GS < N.

Is this intended? For a specific example, calling GpuKernel_sched() w/N=273280 on a Titan X (target_g=768, target_l=512) returns LS=352 GS=768, LS*GS = 270336.

It looks like the function tries to make sure LS*GS >= N here:

*ls = ((n / min_l) / *gs) * min_l;

but the code doesn't do that in this case.

@nouiz
Copy link
Member

nouiz commented May 18, 2017

First, it is impossible that in all cases LS*GS will always be greater then N for big N, due to hardware/cuda limitation. So in some cases, it won't be higher or equal to N. So your code should be able to handle that too (or detect it and raise an error).

Maybe we can modify that function to be better and do it in more cases. I let @abergeron check that in more detail. But it will probably wait to next week, as he should be back today and there is a bunch of stuff that accumulated.

@abergeron
Copy link
Member

Yes, that isn't explicitly mentioned and the documentation could be improved on that, but your code should handle cases where LS*GS doesn't cover the whole N by looping.

I'm leaving this open to remind me to update the doc.

@HapeMask
Copy link
Author

Ah I see, that makes sense. Thanks for clearing things up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants