Release Bug fixes, better image support and last release before v1.0 · DLR-RM/stable-baselines3

Breaking Changes:

evaluate_policy now returns rewards/episode lengths from a Monitor wrapper if one is present,
this allows to return the unnormalized reward in the case of Atari games for instance.
Renamed common.vec_env.is_wrapped to common.vec_env.is_vecenv_wrapped to avoid confusion
with the new is_wrapped() helper
Renamed _get_data() to _get_constructor_parameters() for policies (this affects independent saving/loading of policies)
Removed n_episodes_rollout and merged it with train_freq, which now accepts a tuple (frequency, unit):
replay_buffer in collect_rollout is no more optional

  # SB3 < 0.11.0
  # model = SAC("MlpPolicy", env, n_episodes_rollout=1, train_freq=-1)
  # SB3 >= 0.11.0:
  model = SAC("MlpPolicy", env, train_freq=(1, "episode"))

New Features:

Add support for VecFrameStack to stack on first or last observation dimension, along with
automatic check for image spaces.
VecFrameStack now has a channels_order argument to tell if observations should be stacked
on the first or last observation dimension (originally always stacked on last).
Added common.env_util.is_wrapped and common.env_util.unwrap_wrapper functions for checking/unwrapping
an environment for specific wrapper.
Added env_is_wrapped() method for VecEnv to check if its environments are wrapped
with given Gym wrappers.
Added monitor_kwargs parameter to make_vec_env and make_atari_env
Wrap the environments automatically with a Monitor wrapper when possible.
EvalCallback now logs the success rate when available (is_success must be present in the info dict)
Added new wrappers to log images and matplotlib figures to tensorboard. (@zampanteymedio)
Add support for text records to Logger. (@lorenz-h)

Bug Fixes:

Fixed bug where code added VecTranspose on channel-first image environments (thanks @qxcv)
Fixed DQN predict method when using single gym.Env with deterministic=False
Fixed bug that the arguments order of explained_variance() in ppo.py and a2c.py is not correct (@thisray)
Fixed bug where full HerReplayBuffer leads to an index error. (@megan-klaiber)
Fixed bug where replay buffer could not be saved if it was too big (> 4 Gb) for python<3.8 (thanks @hn2)
Added informative PPO construction error in edge-case scenario where n_steps * n_envs = 1 (size of rollout buffer),
which otherwise causes downstream breaking errors in training (@decodyng)
Fixed discrete observation space support when using multiple envs with A2C/PPO (thanks @ardabbour)
Fixed a bug for TD3 delayed update (the update was off-by-one and not delayed when train_freq=1)
Fixed numpy warning (replaced np.bool with bool)
Fixed a bug where VecNormalize was not normalizing the terminal observation
Fixed a bug where VecTranspose was not transposing the terminal observation
Fixed a bug where the terminal observation stored in the replay buffer was not the right one for off-policy algorithms
Fixed a bug where action_noise was not used when using HER (thanks @ShangqunYu)
Fixed a bug where train_freq was not properly converted when loading a saved model

Others:

Add more issue templates
Add signatures to callable type annotations (@ernestum)
Improve error message in NatureCNN
Added checks for supported action spaces to improve clarity of error messages for the user
Renamed variables in the train() method of SAC, TD3 and DQN to match SB3-Contrib.
Updated docker base image to Ubuntu 18.04
Set tensorboard min version to 2.2.0 (earlier version are apparently not working with PyTorch)
Added warning for PPO when n_steps * n_envs is not a multiple of batch_size (last mini-batch truncated) (@decodyng)
Removed some warnings in the tests

Documentation:

Updated algorithm table
Minor docstring improvements regarding rollout (@stheid)
Fix migration doc for A2C (epsilon parameter)
Fix clip_range docstring
Fix duplicated parameter in EvalCallback docstring (thanks @tfederico)
Added example of learning rate schedule
Added SUMO-RL as example project (@LucasAlegre)
Fix docstring of classes in atari_wrappers.py which were inside the constructor (@LucasAlegre)
Added SB3-Contrib page
Fix bug in the example code of DQN (@AptX395)
Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
Updated migration guide
Updated custom policy doc (separate policy architecture recommended)
Added a note about OpenCV headless version
Corrected typo on documentation (@mschweizer)
Provide the environment when loading the model in the examples (@lorepieri8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes, better image support and last release before v1.0

Breaking Changes:

New Features:

Bug Fixes:

Others:

Documentation: