Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log success rate for on policy algorithms #1870

Conversation

corentinlger
Copy link
Contributor

@corentinlger corentinlger commented Mar 17, 2024

Hi, I changed the on_policy_algorithm.py file to enable showing rollout/success_rate on the monitor for OnPolicyAlgorithm

Description

I added dones as argument to self._update_info_buffer to effectively update the buffer (before it couldn't save info['is_success'] because dones was set to None.

Then I added these lines to the ones that display training infos on the monitor (as in off_policy_algorithm.py :

if len(self.ep_success_buffer) > 0:
            self.logger.record("rollout/success_rate", safe_mean(self.ep_success_buffer))

I also refactored the code to put this whole block writing logs in a _dump_logs function (also in the spirit of off_policy_algorithm.py)

Is the PR ok for you ? If yes I will implement the associated test.

Motivation and Context

closes #1867

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have opened an associated PR on the SB3-Contrib repository (if necessary)
  • I have opened an associated PR on the RL-Zoo3 repository (if necessary)
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

@araffin araffin changed the title Corentinlger/success rate on policy algorithms Log success rate for on policy algorithms Mar 18, 2024
Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, overall LGTM, please go ahead with the tests =)
(and we would need to have a PR ready for SB3 contrib probably to update MaskablePPO and other PPO derivates if needed)

@corentinlger
Copy link
Contributor Author

corentinlger commented Mar 20, 2024

Hi, I added a test with a dummy environment that returns True or False for the success info according to a list of dummy successes (i.e a list of size (n_logs, n_ep_per_log) that determines if the episode j of logging iteration i is going to be a success or not). I put it in tests/test_logger.py, it seems to work but I don't know if it is what you expected.

There is just one thing I'm not sure to understand : instead of getting 0.3, 0.5 and 0.8 success_rate on the three logging iterations (according to the dummy success list I manually created), I get 0.3333333333333333, 0.5555555555555556 and 0.7777777777777778. It don't know if I did something wrong or if it is linked to how the success_rate is computed. Do you know why this happens @araffin ?

@corentinlger corentinlger requested a review from araffin March 20, 2024 10:14
@araffin
Copy link
Member

araffin commented Mar 22, 2024

It don't know if I did something wrong or if it is linked to how the success_rate is computed. Do you know why this happens @araffin ?

I think this was an off-by-one error in the tests, I fixed it and cleanup the test in 2415952

Could you do a similar PR for SB3 contrib?

Copy link
Member

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks =)

@araffin araffin merged commit 071226d into DLR-RM:master Mar 22, 2024
4 checks passed
@corentinlger
Copy link
Contributor Author

I think this was an off-by-one error in the tests, I fixed it and cleanup the test in 2415952

Ok cool ! Indeed the test looks cleaner now (:

Could you do a similar PR for SB3 contrib?

Sure, I'll try that. I only need to implement and test the feature for MaskablePPO and RecurrentPPO right ?

@araffin
Copy link
Member

araffin commented Mar 22, 2024

Sure, I'll try that. I only need to implement and test the feature for MaskablePPO and RecurrentPPO right ?

yes, although it might not be needed as they derive from the on policy base class.

friedeggs pushed a commit to friedeggs/stable-baselines3 that referenced this pull request Jul 22, 2024
* Add success rate in monitor for on policy algorithms

* Update changelog

* make commit-checks refactoring

* Assert buffers are not none in _dump_logs

* Automatic refactoring of the type hinting

* Add success_rate logging test for on policy algorithms

* Update changelog

* Reformat

* Fix tests and update changelog

---------

Co-authored-by: Antonin Raffin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: rollout/success_rate does not show for Monitor + OnPolicyAlgorithm
2 participants