You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Many Nalu-Wind reg tests fail with aperture violations on various flavors of AMD GPUs.
MI300X failures:
I have been building on a non-cray system with MI300X in order to build more quickly. I have successfully built with:
- rocm/6.2.1
- openmpi/5.0.3-ucc1.3.x-ucx1.16.x-rocm6.2.0
Summary:
Many Nalu-Wind reg tests fail with aperture violations on various flavors of AMD GPUs.
MI300X failures:
I have been building on a non-cray system with MI300X in order to build more quickly. I have successfully built with:
- rocm/6.2.1
- openmpi/5.0.3-ucc1.3.x-ucx1.16.x-rocm6.2.0
Exawind-manager
[email protected]:PaulMullowney/exawind-manager.git
branch amd-debug
SHA: c4b7dd11
nalu-wind
[email protected]:PaulMullowney/nalu-wind.git
branch amd-debug
SHA: 9242f8b
Debug builds
MI300X : nalu-wind@master+rocm+tioga amdgpu_target=gfx942 build_type=Debug ^trilinos build_type=Debug
MI250 : nalu-wind@master+rocm+tioga amdgpu_target=gfx90a build_type=Debug ^trilinos build_type=Debug
with tolerances --abs-tol 1e-08 --rel-tol 1e-06
Test MI300X MI250
fsiTurbineSurrogate : Failed (4) Failed (1)
airfoilRANSEdgeNGPHypre.rst Passed Passed
ablNeutralNGPHypre Passed Failed (2)
ablNeutralNGPHypreSegregated Passed Failed (2)
multiElemCylinder Failed (3) Failed (3)
VOFZalDisk Failed (4) Failed (4)
airfoilSST_Gamma_Trans Passed Passed
oversetRotCylNGPHypre Passed Failed (4)
convTaylorVortex Failed (5) Failed (3)
unitTestGPU Passed Passed
Release build
MI300X : nalu-wind@master+rocm+tioga amdgpu_target=gfx942 build_type=Release ^trilinos build_type=Release
MI250 : nalu-wind@master+rocm+tioga amdgpu_target=gfx90a build_type=Release ^trilinos build_type=Release
Test MI300X MI250
fsiTurbineSurrogate : Failed (4) Failed (4)
airfoilRANSEdgeNGPHypre.rst Passed Passed
ablNeutralNGPHypre Failed (3) Failed (3)
ablNeutralNGPHypreSegregated Failed (3) Failed (3)
multiElemCylinder Failed (3) Failed (3)
VOFZalDisk Failed (4) Failed (4)
airfoilSST_Gamma_Trans Passed Passed
oversetRotCylNGPHypre Failed (4) Failed (4)
convTaylorVortex Failed (3) Failed (3)
unitTestGPU Passed Passed
Failure Pattern: Last Nalu output
Memory access fault
Time Step Count: 1 Current Time: 0.00455075
dtN: 0.00455075 dtNm1: 0.00455075 gammas: 1 -1 0
Memory access fault + HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION
Realm::populate_variables_form_input() candidate input time: 0 for Realm: fluidRealm
Memory access fault
Parallel consistency noted in master/slave pairings:
Memory access fault (+ HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION)
Realm::create_output_mesh() End
Runs to completion but doesn't generate .norm file????
The text was updated successfully, but these errors were encountered: