Bug fixes, better image support and last release before v1.0
Pre-release
Pre-release
Breaking Changes:
evaluate_policy
now returns rewards/episode lengths from aMonitor
wrapper if one is present,
this allows to return the unnormalized reward in the case of Atari games for instance.- Renamed
common.vec_env.is_wrapped
tocommon.vec_env.is_vecenv_wrapped
to avoid confusion
with the newis_wrapped()
helper - Renamed
_get_data()
to_get_constructor_parameters()
for policies (this affects independent saving/loading of policies) - Removed
n_episodes_rollout
and merged it withtrain_freq
, which now accepts a tuple(frequency, unit)
: replay_buffer
incollect_rollout
is no more optional
# SB3 < 0.11.0
# model = SAC("MlpPolicy", env, n_episodes_rollout=1, train_freq=-1)
# SB3 >= 0.11.0:
model = SAC("MlpPolicy", env, train_freq=(1, "episode"))
New Features:
- Add support for
VecFrameStack
to stack on first or last observation dimension, along with
automatic check for image spaces. VecFrameStack
now has achannels_order
argument to tell if observations should be stacked
on the first or last observation dimension (originally always stacked on last).- Added
common.env_util.is_wrapped
andcommon.env_util.unwrap_wrapper
functions for checking/unwrapping
an environment for specific wrapper. - Added
env_is_wrapped()
method forVecEnv
to check if its environments are wrapped
with given Gym wrappers. - Added
monitor_kwargs
parameter tomake_vec_env
andmake_atari_env
- Wrap the environments automatically with a
Monitor
wrapper when possible. EvalCallback
now logs the success rate when available (is_success
must be present in the info dict)- Added new wrappers to log images and matplotlib figures to tensorboard. (@zampanteymedio)
- Add support for text records to
Logger
. (@lorenz-h)
Bug Fixes:
- Fixed bug where code added VecTranspose on channel-first image environments (thanks @qxcv)
- Fixed
DQN
predict method when using singlegym.Env
withdeterministic=False
- Fixed bug that the arguments order of
explained_variance()
inppo.py
anda2c.py
is not correct (@thisray) - Fixed bug where full
HerReplayBuffer
leads to an index error. (@megan-klaiber) - Fixed bug where replay buffer could not be saved if it was too big (> 4 Gb) for python<3.8 (thanks @hn2)
- Added informative
PPO
construction error in edge-case scenario wheren_steps * n_envs = 1
(size of rollout buffer),
which otherwise causes downstream breaking errors in training (@decodyng) - Fixed discrete observation space support when using multiple envs with A2C/PPO (thanks @ardabbour)
- Fixed a bug for TD3 delayed update (the update was off-by-one and not delayed when
train_freq=1
) - Fixed numpy warning (replaced
np.bool
withbool
) - Fixed a bug where
VecNormalize
was not normalizing the terminal observation - Fixed a bug where
VecTranspose
was not transposing the terminal observation - Fixed a bug where the terminal observation stored in the replay buffer was not the right one for off-policy algorithms
- Fixed a bug where
action_noise
was not used when usingHER
(thanks @ShangqunYu) - Fixed a bug where
train_freq
was not properly converted when loading a saved model
Others:
- Add more issue templates
- Add signatures to callable type annotations (@ernestum)
- Improve error message in
NatureCNN
- Added checks for supported action spaces to improve clarity of error messages for the user
- Renamed variables in the
train()
method ofSAC
,TD3
andDQN
to match SB3-Contrib. - Updated docker base image to Ubuntu 18.04
- Set tensorboard min version to 2.2.0 (earlier version are apparently not working with PyTorch)
- Added warning for
PPO
whenn_steps * n_envs
is not a multiple ofbatch_size
(last mini-batch truncated) (@decodyng) - Removed some warnings in the tests
Documentation:
- Updated algorithm table
- Minor docstring improvements regarding rollout (@stheid)
- Fix migration doc for
A2C
(epsilon parameter) - Fix
clip_range
docstring - Fix duplicated parameter in
EvalCallback
docstring (thanks @tfederico) - Added example of learning rate schedule
- Added SUMO-RL as example project (@LucasAlegre)
- Fix docstring of classes in atari_wrappers.py which were inside the constructor (@LucasAlegre)
- Added SB3-Contrib page
- Fix bug in the example code of DQN (@AptX395)
- Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
- Updated migration guide
- Updated custom policy doc (separate policy architecture recommended)
- Added a note about OpenCV headless version
- Corrected typo on documentation (@mschweizer)
- Provide the environment when loading the model in the examples (@lorepieri8)