-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in base_class.py #1908
Conversation
Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading a PPO learner from storage on a multi-GPU system since the ep_info_buffer is loaded to the memory location it was on while it was saved to disk, instead of the target loading location, and is then not cleaned up.
Do you have a minimal example to reproduce/track this behavior? also, how big is the leak? |
Hi, I managed to get this working example: import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.env_util import make_vec_env
import torch
# Parallel environments
train=True
vec_env = make_vec_env("CartPole-v1", n_envs=2)
if train:
model = PPO("MlpPolicy", vec_env, verbose=1, device="cuda:2")
model.learn(total_timesteps=1)
model.ep_info_buffer.extend([torch.ones(10000,device="cuda:2")])
model.save("ppo_cartpole")
del model
model = PPO("MlpPolicy", vec_env, verbose=1, device="cuda:3")
import torch
model.set_parameters("ppo_cartpole.zip", device="cuda:3")
torch.cuda.empty_cache()
import time
print("watch gpustat now")
time.sleep(10)
print("memory on ep info device:", torch.cuda.memory_allocated("cuda:2"))# prints 0 Adjust the cuda devices according to your machine. Run once with train=True, then you can turn it off to make sure GPU usage comes from that. Running Anyways, setting |
Here is the output of
|
Is there anything I should still do for merger? |
From your side, nothing for now ;) What is missing is from my side. I need to take some time to reproduce the problem and check that nothing else would be impacted. |
I took the time to look at your example closely but I don't understand Btw, PPO is usually faster when everything is on CPU (when not using a CNN).
I'm also not sure to understand this. |
Oh... I am using an internal framework built on stable baselines. I do not entirely understand what it is doing but apparently we have a use case for putting torch variables in this buffer. To reproduce the issue, I did this as well.
Thanks for the tip!
Exactly! I do not want to touch the gpu that was used for training when doing inference. From my understanding, there is two issues here:
While it is arguable that 2. is only a problem because of 1., loading the data is superfluous since the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks =)
* Fix memory leak in base_class.py Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading a PPO learner from storage on a multi-GPU system since the ep_info_buffer is loaded to the memory location it was on while it was saved to disk, instead of the target loading location, and is then not cleaned up. * Update changelog.rst * Update changelog --------- Co-authored-by: Antonin RAFFIN <[email protected]>
Description
Loading the data return value is not necessary since it is unused. Loading the data causes a memory leak through the ep_info_buffer variable. I found this while loading a PPO learner from storage on a multi-GPU system since the ep_info_buffer is loaded to the memory location it was on while it was saved to disk, instead of the target loading location, and is then not cleaned up.
Motivation and Context
Types of changes
Checklist
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)make doc
(required)Note: You can run most of the checks using
make commit-checks
.Note: we are using a maximum length of 127 characters per line