Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Usage and Possible Leak #43

Open
wlruys opened this issue Jun 15, 2019 · 8 comments
Open

Memory Usage and Possible Leak #43

wlruys opened this issue Jun 15, 2019 · 8 comments
Assignees

Comments

@wlruys
Copy link

wlruys commented Jun 15, 2019

Hi,

We've noticed that when performing repeated calls to evaluate that memory usage continues to grow. This is limiting our ability to use GOFMM with an eigensolver & clustering, after ~50-100 iterations it uses 20+ GB. (Sometimes over 60, depending on the dataset size)
Not sure if this is related to issue #37

In https://github.com/dialecticDolt/hmlp/tree/pythondevel we tried adding deconstructors to Data (In case they didn't inherit properly from vector and ReadWrite) but this didn't change the behavior that we were seeing.

This can be seen with a simple test with example/distributed_fast_matvec_solver.cpp

\\DistData on Stack
for(int rep=0; rep<50; rep++){
    DistData<RIDS, STAR, T> u1 = mpigofmm::Evaluate( tree1, w1 );
}
\\With explicit deallocation
DistData<RIDS, STAR, T>* u2;
for(int rep=0; rep<50; rep++){
    u2 = mpigofmm::Evaluate_Pointer(tree1, w1);
    delete(u2);
}

Where mpigofmm::Evaluate_Pointer(tree, w1) is a version of Evaluate that allocated potentials by calling new and returns a pointer to it.

Running valgrind over this type of example shows lost memory, but not to this magnitude. The largest 'definitely lost' blocks are near the xgemm tasks and S2S Tasks.

Could you take a look at what the cause might be?

@ChenhanYu
Copy link
Owner

Indeed this seems to be a potential memory leak. I will try to reproduce the problem and find the source. Does this potential leaking block the progress? Or in other words, how urgent is this issue?

ChenhanYu pushed a commit that referenced this issue Jul 4, 2019
@ChenhanYu ChenhanYu self-assigned this Jul 5, 2019
@ChenhanYu
Copy link
Owner

I have tried to fix the memory leaking problem. This issue is due to a potential race condition in the S2S and L2L tasks. As a result, the memory was not destroyed correctly. Could you give it a try to see if the leaking problem goes away or becomes less problematic? Many thanks.

@wlruys
Copy link
Author

wlruys commented Aug 30, 2019

Sorry for the two month delay. I ended up working on something else during the summer.
On my end, it looks like the leaking problem is about the same (on repeated calls to evaluate) in the current develop branch to how it was.

@ChenhanYu
Copy link
Owner

Could you provide me the example to reproduce on the develop branch? It will be easier for me to look into the problem. Many thanks.

@wlruys
Copy link
Author

wlruys commented Sep 12, 2019

I've pulled the current branch of develop and added an example/memory_test script here:
https://github.com/dialecticDolt/hmlp/tree/develop
(It is not a good kernel setup for compression but it shows the behavior well)

The memory profile that this produces on my workstation is given below.
The blue line shows RAM usage in MB. The plateau at 15 seconds is when it switches to the other test function (and spends some time running nearest neighbors).
We can see that memory usage is preserved out of scope.

memtest_out

@ChenhanYu
Copy link
Owner

Thanks, I will take a look.

@ChenhanYu
Copy link
Owner

OK I am able to reproduce the problem. I will spend some time over the weekend to see if there is an easy fix. If not, I will provide an ETA on fundamentally improving memory management. Thank you for filing this bug.

@ChenhanYu
Copy link
Owner

ChenhanYu commented Sep 18, 2019

I have fixed the problem in the develop branch at least in the way that I couldn't reproduce in the memory_test. The problem is resulted from creating nested parallel GEMM tasks. Disable this feature will solve the memory leaking issue. I will restore the support of this feature when it is entirely fixed. I will also add the memory_test you provided to the example. Thank you for your contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants