Log success rate for on policy algorithms #1870

corentinlger · 2024-03-17T22:36:44Z

Hi, I changed the on_policy_algorithm.py file to enable showing rollout/success_rate on the monitor for OnPolicyAlgorithm

Description

I added dones as argument to self._update_info_buffer to effectively update the buffer (before it couldn't save info['is_success'] because dones was set to None.

Then I added these lines to the ones that display training infos on the monitor (as in off_policy_algorithm.py :

if len(self.ep_success_buffer) > 0:
            self.logger.record("rollout/success_rate", safe_mean(self.ep_success_buffer))

I also refactored the code to put this whole block writing logs in a _dump_logs function (also in the spirit of off_policy_algorithm.py)

Is the PR ok for you ? If yes I will implement the associated test.

Motivation and Context

closes #1867

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

…_algorithms

stable_baselines3/common/on_policy_algorithm.py

araffin

Thanks for the PR, overall LGTM, please go ahead with the tests =)
(and we would need to have a PR ready for SB3 contrib probably to update MaskablePPO and other PPO derivates if needed)

corentinlger · 2024-03-20T10:13:52Z

Hi, I added a test with a dummy environment that returns True or False for the success info according to a list of dummy successes (i.e a list of size (n_logs, n_ep_per_log) that determines if the episode j of logging iteration i is going to be a success or not). I put it in tests/test_logger.py, it seems to work but I don't know if it is what you expected.

There is just one thing I'm not sure to understand : instead of getting 0.3, 0.5 and 0.8 success_rate on the three logging iterations (according to the dummy success list I manually created), I get 0.3333333333333333, 0.5555555555555556 and 0.7777777777777778. It don't know if I did something wrong or if it is linked to how the success_rate is computed. Do you know why this happens @araffin ?

araffin · 2024-03-22T10:57:38Z

It don't know if I did something wrong or if it is linked to how the success_rate is computed. Do you know why this happens @araffin ?

I think this was an off-by-one error in the tests, I fixed it and cleanup the test in 2415952

Could you do a similar PR for SB3 contrib?

araffin

LGTM, thanks =)

corentinlger · 2024-03-22T16:12:03Z

I think this was an off-by-one error in the tests, I fixed it and cleanup the test in 2415952

Ok cool ! Indeed the test looks cleaner now (:

Could you do a similar PR for SB3 contrib?

Sure, I'll try that. I only need to implement and test the feature for MaskablePPO and RecurrentPPO right ?

araffin · 2024-03-22T17:41:34Z

Sure, I'll try that. I only need to implement and test the feature for MaskablePPO and RecurrentPPO right ?

yes, although it might not be needed as they derive from the on policy base class.

* Add success rate in monitor for on policy algorithms * Update changelog * make commit-checks refactoring * Assert buffers are not none in _dump_logs * Automatic refactoring of the type hinting * Add success_rate logging test for on policy algorithms * Update changelog * Reformat * Fix tests and update changelog --------- Co-authored-by: Antonin Raffin <[email protected]>

corentinlger and others added 4 commits March 17, 2024 20:02

Add success rate in monitor for on policy algorithms

327144f

Merge branch 'DLR-RM:master' into corentinlger/success_rate_on_policy…

fc903cd

…_algorithms

Update changelog

a261ef6

make commit-checks refactoring

8eaf7d6

araffin changed the title ~~Corentinlger/success rate on policy algorithms~~ Log success rate for on policy algorithms Mar 18, 2024

araffin reviewed Mar 18, 2024

View reviewed changes

stable_baselines3/common/on_policy_algorithm.py Show resolved Hide resolved

araffin reviewed Mar 18, 2024

View reviewed changes

corentinlger added 3 commits March 20, 2024 10:55

Assert buffers are not none in _dump_logs

ea46aaf

Automatic refactoring of the type hinting

e1910d4

Add success_rate logging test for on policy algorithms

3e5e276

corentinlger requested a review from araffin March 20, 2024 10:14

corentinlger and others added 3 commits March 20, 2024 11:43

Update changelog

1a904ab

Reformat

5a191df

Fix tests and update changelog

2415952

araffin approved these changes Mar 22, 2024

View reviewed changes

araffin merged commit 071226d into DLR-RM:master Mar 22, 2024
4 checks passed

araffin mentioned this pull request Mar 31, 2024

Log success rate for PPO variants Stable-Baselines-Team/stable-baselines3-contrib#235

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log success rate for on policy algorithms #1870

Log success rate for on policy algorithms #1870

corentinlger commented Mar 17, 2024 •

edited

Loading

araffin left a comment

corentinlger commented Mar 20, 2024 •

edited

Loading

araffin commented Mar 22, 2024

araffin left a comment

corentinlger commented Mar 22, 2024

araffin commented Mar 22, 2024

Log success rate for on policy algorithms #1870

Log success rate for on policy algorithms #1870

Conversation

corentinlger commented Mar 17, 2024 • edited Loading

Description

Motivation and Context

Types of changes

Checklist

araffin left a comment

Choose a reason for hiding this comment

corentinlger commented Mar 20, 2024 • edited Loading

araffin commented Mar 22, 2024

araffin left a comment

Choose a reason for hiding this comment

corentinlger commented Mar 22, 2024

araffin commented Mar 22, 2024

corentinlger commented Mar 17, 2024 •

edited

Loading

corentinlger commented Mar 20, 2024 •

edited

Loading