SB3 v1.6.1: Bug fix release
SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
Breaking Changes:
- Switched minimum tensorboard version to 2.9.1
New Features:
- Support logging hyperparameters to tensorboard (@timothe-chaumont)
- Added checkpoints for replay buffer and
VecNormalize
statistics (@anand-bala) - Added option for
Monitor
to append to existing file instead of overriding (@sidney-tio) - The env checker now raises an error when using dict observation spaces and observation keys don't match observation space keys
SB3-Contrib
- Fixed the issue of wrongly passing policy arguments when using
CnnLstmPolicy
orMultiInputLstmPolicy
withRecurrentPPO
(@mlodel)
Bug Fixes:
- Fixed issue where
PPO
gives NaN if rollout buffer provides a batch of size 1 (@hughperkins) - Fixed the issue that
predict
does not always return action asnp.ndarray
(@qgallouedec) - Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
- Added multidimensional action space support (@qgallouedec)
- Fixed missing verbose parameter passing in the
EvalCallback
constructor (@BurakDmb) - Fixed the issue that when updating the target network in DQN, SAC, TD3, the
running_mean
andrunning_var
properties of batch norm layers are not updated (@honglu2875) - Fixed incorrect type annotation of the replay_buffer_class argument in
common.OffPolicyAlgorithm
initializer, where an instance instead of a class was required (@Rocamonde) - Fixed loading saved model with different number of envrionments
- Removed
forward()
abstract method declaration fromcommon.policies.BaseModel
(already defined intorch.nn.Module
) to fix type errors in subclasses (@Rocamonde) - Fixed the return type of
.load()
and.learn()
methods inBaseAlgorithm
so that they now useTypeVar
(@Rocamonde) - Fixed an issue where keys with different tags but the same key raised an error in
common.logger.HumanOutputFormat
(@Rocamonde and @AdamGleave)
Others:
- Fixed
DictReplayBuffer.next_observations
typing (@qgallouedec) - Added support for
device="auto"
in buffers and made it default (@qgallouedec) - Updated
ResultsWriter` (used internally by
Monitorwrapper) to automatically create missing directories when
filename`` is a path (@dominicgkerr)
Documentation:
- Added an example of callback that logs hyperparameters to tensorboard. (@timothe-chaumont)
- Fixed typo in docstring "nature" -> "Nature" (@Melanol)
- Added info on split tensorboard logs into (@Melanol)
- Fixed typo in ppo doc (@francescoluciano)
- Fixed typo in install doc(@jlp-ue)
- Clarified and standardized verbosity documentation
- Added link to a GitHub issue in the custom policy documentation (@AlexPasqua)
- Fixed typos (@Akhilez)