-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log success rate for on policy algorithms #1870
Log success rate for on policy algorithms #1870
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, overall LGTM, please go ahead with the tests =)
(and we would need to have a PR ready for SB3 contrib probably to update MaskablePPO and other PPO derivates if needed)
Hi, I added a test with a dummy environment that returns There is just one thing I'm not sure to understand : instead of getting 0.3, 0.5 and 0.8 success_rate on the three logging iterations (according to the dummy success list I manually created), I get 0.3333333333333333, 0.5555555555555556 and 0.7777777777777778. It don't know if I did something wrong or if it is linked to how the success_rate is computed. Do you know why this happens @araffin ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks =)
Ok cool ! Indeed the test looks cleaner now (:
Sure, I'll try that. I only need to implement and test the feature for |
yes, although it might not be needed as they derive from the on policy base class. |
* Add success rate in monitor for on policy algorithms * Update changelog * make commit-checks refactoring * Assert buffers are not none in _dump_logs * Automatic refactoring of the type hinting * Add success_rate logging test for on policy algorithms * Update changelog * Reformat * Fix tests and update changelog --------- Co-authored-by: Antonin Raffin <[email protected]>
Hi, I changed the
on_policy_algorithm.py
file to enable showing rollout/success_rate on the monitor forOnPolicyAlgorithm
Description
I added
dones
as argument toself._update_info_buffer
to effectively update the buffer (before it couldn't saveinfo['is_success']
because dones was set toNone
.Then I added these lines to the ones that display training infos on the monitor (as in off_policy_algorithm.py :
I also refactored the code to put this whole block writing logs in a
_dump_logs
function (also in the spirit of off_policy_algorithm.py)Is the PR ok for you ? If yes I will implement the associated test.
Motivation and Context
closes #1867
Types of changes
Checklist
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)make doc
(required)Note: You can run most of the checks using
make commit-checks
.Note: we are using a maximum length of 127 characters per line