-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDNA2+ cards seem to be underutilized, because of dual compute unit design? #186
Comments
7900xtx reports:
so it reports 48 CU while it have 96 actually. Please note that the |
Please note that the 48CU is only used here as log message. The important part is |
On 7900xtx you can get more performance by setting I think for RDNA2 cards where we have high ram to CU ratio we're packing too much ram. In comparsion nvidia 4090 allows us to use 6GiB of ram (max_mem_alloc_size) while 7900xtx 21GiB. |
That's because we are using right now one Line 373 in 168ee31
If that's true (has to be tested on real HW) maybe we can create round robin list of Line 409 in 168ee31
len(devices) parts?
|
This could be solved on a higher level by allowing parallel initialization on multiple GPUs (for example if the user has 2 cards attached to the PC). The code could distribute initialization tasks to all available devices in parallel. |
It's not entirely like that. OpenCL on Windows returns one device with just half of the CU as stated above. Splitting the work to two while keeping the rest of the code as is yields even worse results. |
From my understanding of RDNA2 architecture each RDNA2 (and newer) will report only half of CUs to the OpenCL.
(from the whitepaper)
The result is that it seems that the GPUs are underutilized.
The text was updated successfully, but these errors were encountered: