Memory Usage and Possible Leak #43

wlruys · 2019-06-15T23:15:51Z

Hi,

We've noticed that when performing repeated calls to evaluate that memory usage continues to grow. This is limiting our ability to use GOFMM with an eigensolver & clustering, after ~50-100 iterations it uses 20+ GB. (Sometimes over 60, depending on the dataset size)
Not sure if this is related to issue #37

In https://github.com/dialecticDolt/hmlp/tree/pythondevel we tried adding deconstructors to Data (In case they didn't inherit properly from vector and ReadWrite) but this didn't change the behavior that we were seeing.

This can be seen with a simple test with example/distributed_fast_matvec_solver.cpp

\\DistData on Stack
for(int rep=0; rep<50; rep++){
    DistData<RIDS, STAR, T> u1 = mpigofmm::Evaluate( tree1, w1 );
}
\\With explicit deallocation
DistData<RIDS, STAR, T>* u2;
for(int rep=0; rep<50; rep++){
    u2 = mpigofmm::Evaluate_Pointer(tree1, w1);
    delete(u2);
}

Where mpigofmm::Evaluate_Pointer(tree, w1) is a version of Evaluate that allocated potentials by calling new and returns a pointer to it.

Running valgrind over this type of example shows lost memory, but not to this magnitude. The largest 'definitely lost' blocks are near the xgemm tasks and S2S Tasks.

Could you take a look at what the cause might be?

The text was updated successfully, but these errors were encountered:

ChenhanYu · 2019-06-17T20:01:53Z

Indeed this seems to be a potential memory leak. I will try to reproduce the problem and find the source. Does this potential leaking block the progress? Or in other words, how urgent is this issue?

…t return hmlpError_t

ChenhanYu · 2019-07-05T00:04:39Z

I have tried to fix the memory leaking problem. This issue is due to a potential race condition in the S2S and L2L tasks. As a result, the memory was not destroyed correctly. Could you give it a try to see if the leaking problem goes away or becomes less problematic? Many thanks.

wlruys · 2019-08-30T23:15:59Z

Sorry for the two month delay. I ended up working on something else during the summer.
On my end, it looks like the leaking problem is about the same (on repeated calls to evaluate) in the current develop branch to how it was.

ChenhanYu · 2019-09-11T01:38:26Z

Could you provide me the example to reproduce on the develop branch? It will be easier for me to look into the problem. Many thanks.

wlruys · 2019-09-12T21:38:03Z

I've pulled the current branch of develop and added an example/memory_test script here:
https://github.com/dialecticDolt/hmlp/tree/develop
(It is not a good kernel setup for compression but it shows the behavior well)

The memory profile that this produces on my workstation is given below.
The blue line shows RAM usage in MB. The plateau at 15 seconds is when it switches to the other test function (and spends some time running nearest neighbors).
We can see that memory usage is preserved out of scope.

ChenhanYu · 2019-09-16T14:49:25Z

Thanks, I will take a look.

ChenhanYu · 2019-09-17T23:50:04Z

OK I am able to reproduce the problem. I will spend some time over the weekend to see if there is an easy fix. If not, I will provide an ETA on fundamentally improving memory management. Thank you for filing this bug.

ChenhanYu · 2019-09-18T19:32:25Z

I have fixed the problem in the develop branch at least in the way that I couldn't reproduce in the memory_test. The problem is resulted from creating nested parallel GEMM tasks. Disable this feature will solve the memory leaking issue. I will restore the support of this feature when it is entirely fixed. I will also add the memory_test you provided to the example. Thank you for your contribution.

ChenhanYu pushed a commit that referenced this issue Jul 4, 2019

fix: memory leaking issue #43; now all member functions of a task mus…

7a62947

…t return hmlpError_t

ChenhanYu self-assigned this Jul 5, 2019

ChenhanYu added bug enhancement labels Jul 5, 2019

ChenhanYu pushed a commit that referenced this issue Sep 18, 2019

fix: #43 by disabling nested parallel gemm in the runtime system

d5cf7fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Usage and Possible Leak #43

Memory Usage and Possible Leak #43

wlruys commented Jun 15, 2019 •

edited

Loading

ChenhanYu commented Jun 17, 2019

ChenhanYu commented Jul 5, 2019

wlruys commented Aug 30, 2019

ChenhanYu commented Sep 11, 2019

wlruys commented Sep 12, 2019 •

edited

Loading

ChenhanYu commented Sep 16, 2019

ChenhanYu commented Sep 17, 2019

ChenhanYu commented Sep 18, 2019 •

edited

Loading

Memory Usage and Possible Leak #43

Memory Usage and Possible Leak #43

Comments

wlruys commented Jun 15, 2019 • edited Loading

ChenhanYu commented Jun 17, 2019

ChenhanYu commented Jul 5, 2019

wlruys commented Aug 30, 2019

ChenhanYu commented Sep 11, 2019

wlruys commented Sep 12, 2019 • edited Loading

ChenhanYu commented Sep 16, 2019

ChenhanYu commented Sep 17, 2019

ChenhanYu commented Sep 18, 2019 • edited Loading

wlruys commented Jun 15, 2019 •

edited

Loading

wlruys commented Sep 12, 2019 •

edited

Loading

ChenhanYu commented Sep 18, 2019 •

edited

Loading