WIP: EDGE3D update #478

pguthrey · 2024-09-09T20:32:06Z

Summary

This PR is a refactoring and bugfix
It does the following:
- TODO: Explores performance improvements to EDGE3D
- TODO: Addresses the openMP target issues Fix EDGE3D kernel #406

rhornung67 · 2024-10-10T17:53:38Z

@pguthrey is this ready for review?

pguthrey · 2024-10-13T17:21:32Z

@pguthrey is this ready for review?

Not just yet. Need to experiment more with different implementations. Will come back to this later.
Also - there are additional apps that are broken with respect to targetopenmp, so I think I will pull that out into another branch and work with others to fix those examples (unless this is no longer supported)

pguthrey · 2024-11-27T02:37:48Z

Here are the results of these changes. Good improvement for CUDA. Impossibly good improvement for HIP. I checked that the results are the same as the previous algorithm... but I might look more into what is going on with HIP.

CUDA	Base_Seq	Lambda_Seq	RAJA_Seq	Base_OpenMP	Lambda_OpenMP	RAJA_OpenMP	Base_HIP	Lambda_HIP	RAJA_HIP
	default	default	default	default	default	default	block_256	block_256	block_256
current impl	4.42E+01	4.45E+01	4.44E+01	8.98E-01	9.08E-01	9.04E-01	1.53E-01	1.53E-01	1.54E-01
new impl	3.05E+01	3.06E+01	3.06E+01	6.08E-01	6.11E-01	6.11E-01	7.06E-02	7.02E-02	7.04E-02
speedup current/new	1.5	1.5	1.5	1.5	1.5	1.5	2.2	2.2	2.2
speedup openmp/kernel				1.0	1.0	1.0	8.6	8.7	8.6

HIP	Base_Seq	Lambda_Seq	RAJA_Seq	Base_OpenMP	Lambda_OpenMP	RAJA_OpenMP	Base_HIP	Lambda_HIP	RAJA_HIP
	default	default	default	default	default	default	block_256	block_256	block_256
current impl	1.74E+01	1.74E+01	1.74E+01	4.36E-01	3.79E-01	4.44E-01	1.34E-01	9.64E-02	9.73E-02
new impl	1.56E+01	1.54E+01	1.54E+01	3.51E-01	3.70E-01	4.01E-01	1.03E-03	9.90E-05	9.10E-05
speedup current/new	1.1	1.1	1.1	1.2	1.0	1.1	130.8	974.0	1069.0
speedup new openmp/kernel				1.0	0.9	0.9	341.7	3541.4	3852.7

MrBurmark · 2024-11-27T16:30:39Z

Perhaps there could still be register spilling with cuda or something like that that is making a dramatic difference. We'll have to look at the instructions to see what happened.

pguthrey · 2024-11-27T19:35:45Z

Perhaps there could still be register spilling with cuda or something like that that is making a dramatic difference. We'll have to look at the instructions to see what happened.

That makes some sense. If I add the memory needed by the vectors and the matrix together I get

12*13/2 + 3*12 + 3*12 = 150 > 128

pguthrey force-pushed the feature/guthrey1/edge_3d_update branch 3 times, most recently from 32723a7 to 702238f Compare September 11, 2024 22:29

rhornung67 requested review from MrBurmark, artv3 and rhornung67 October 10, 2024 17:56

pguthrey removed request for MrBurmark, artv3 and rhornung67 October 13, 2024 17:17

pguthrey marked this pull request as draft October 13, 2024 17:18

use symmetry for storage on innermost loop (but not outermost)

c9b19a6

pguthrey force-pushed the feature/guthrey1/edge_3d_update branch from 050da7e to c9b19a6 Compare November 23, 2024 03:46

pguthrey added 2 commits November 22, 2024 19:46

comments

3298c9a

call symmetric impl

5e593d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: EDGE3D update #478

WIP: EDGE3D update #478

pguthrey commented Sep 9, 2024

rhornung67 commented Oct 10, 2024

pguthrey commented Oct 13, 2024

pguthrey commented Nov 27, 2024 •

edited

Loading

MrBurmark commented Nov 27, 2024

pguthrey commented Nov 27, 2024

WIP: EDGE3D update #478

Are you sure you want to change the base?

WIP: EDGE3D update #478

Conversation

pguthrey commented Sep 9, 2024

Summary

rhornung67 commented Oct 10, 2024

pguthrey commented Oct 13, 2024

pguthrey commented Nov 27, 2024 • edited Loading

MrBurmark commented Nov 27, 2024

pguthrey commented Nov 27, 2024

pguthrey commented Nov 27, 2024 •

edited

Loading