Replies: 1 comment 1 reply
-
Hello, so purejaxRL PPO and stoix's PPO are very similar, however the main difference is that computation is divided over all your devices say you said. What this means is that parameters like the number of parallel environments is divided by the number of devices so it's possible your PPO hyperparameters are making it unstable. May I ask what hyperparameters you are using? In my experiments using continuous PPO on the Brax environments, I found the Stoix PPO to be very stable and high performing given suitable hyperparameters. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I am currently working on a research project in reinforcement learning and am very interested in stoix.
I am running a custom environment based on gymnax with PPO-Continuous on anakin using 3 GPUs, but the performance is unstable compared to purejaxrl (with 1 GPU).
Is there any baseline code available that can help compare anakin and purejaxrl, like the ones provided in the repository?
I'd like to start by testing both on environments such as CartPole to ensure proper comparison.
Thank you in advance.
Beta Was this translation helpful? Give feedback.
All reactions