Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip ba_global_free when BA is already destroyed #939

Closed

Conversation

bratpiorka
Copy link
Contributor

Skip ba_global_free when BA is already destroyed.

This could happen when the globally allocated object with a destructor that calls any UMF free/destroy function is called after UMF tear down and base alloc destroy. Then, any dereferencing of previously allocated (and mmaped) by base alloc pointer would cause a crash (as the pointer is already unmapped).

@bratpiorka bratpiorka requested a review from a team as a code owner November 27, 2024 09:00
@bratpiorka bratpiorka requested a review from ldorau November 27, 2024 09:01
@bratpiorka bratpiorka force-pushed the rrudnick_fix_ba_global_destroy branch from 6aad171 to db92ea0 Compare November 27, 2024 09:19
@vinser52
Copy link
Contributor

Do we have reproducer for such scenario? Should we add it as a test?

@@ -17,6 +19,7 @@ extern "C" {
void *umf_ba_global_alloc(size_t size);
void umf_ba_global_free(void *ptr);
void umf_ba_destroy_global(void);
bool umf_ba_global_is_destroyed();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where it is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now nowhere but it would be needed when Disjoint Pool would use base alloc (in C version) in umfDisjointPoolParamsDestroy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just wondering if it is the right way to fix it. See my PR #940, will the same approach for disjoint_pool (calling libumfInit in the right place) resolve your issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this helps - thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should we close this PR?

@bratpiorka bratpiorka force-pushed the rrudnick_fix_ba_global_destroy branch from db92ea0 to 70136eb Compare November 27, 2024 09:26
@bratpiorka bratpiorka force-pushed the rrudnick_fix_ba_global_destroy branch from 70136eb to 2aad2ba Compare November 27, 2024 09:43
@bratpiorka
Copy link
Contributor Author

Do we have reproducer for such scenario? Should we add it as a test?

this bug also occurs here: https://github.com/oneapi-src/unified-memory-framework/actions/runs/12041220655/job/33590212027

In the CUDA provider test (after params refactoring) we are calling umfCUDAMemoryProviderParamsDestroy in the destructor of global objects: cuParams_device_memory, cuParams_shared_memory and cuParams_host_memory. umfCUDAMemoryProviderParamsDestroy calls BA free which dereferences already unmapped pointer

@vinser52
Copy link
Contributor

Do we have reproducer for such scenario? Should we add it as a test?

this bug also occurs here: https://github.com/oneapi-src/unified-memory-framework/actions/runs/12041220655/job/33590212027

In the CUDA provider test (after params refactoring) we are calling umfCUDAMemoryProviderParamsDestroy in the destructor of global objects: cuParams_device_memory, cuParams_shared_memory and cuParams_host_memory. umfCUDAMemoryProviderParamsDestroy calls BA free which dereferences already unmapped pointer

Oh, I see. I saw the similar issue with L0 provider params when working on it and fixed it. But the corresponding CUDA PR was already merged and I forgot to apply the same fix for CUDA params. Here is the PR #940 that should fix this failure.

The issue is that libumfInit() is not called, because of that destructors order is incorrect.

@bratpiorka bratpiorka closed this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants