Release HER with online and offline sampling, bug fixes for features extraction · DLR-RM/stable-baselines3

Breaking Changes

Warning: Renamed common.cmd_util to common.env_util for clarity (affects make_vec_env and make_atari_env functions)

Allow custom actor/critic network architectures using net_arch=dict(qf=[400, 300], pi=[64, 64]) for off-policy algorithms (SAC, TD3, DDPG)
Added Hindsight Experience Replay HER. (@megan-klaiber)
VecNormalize now supports gym.spaces.Dict observation spaces
Support logging videos to Tensorboard (@SwamyDev)
Added share_features_extractor argument to SAC and TD3 policies

Fix GAE computation for on-policy algorithms (off-by one for the last value) (thanks @Wovchena)
Fixed potential issue when loading a different environment
Fix ignoring the exclude parameter when recording logs using json, csv or log as logging format (@SwamyDev)
Make make_vec_env support the env_kwargs argument when using an env ID str (@ManifoldFR)
Fix model creation initializing CUDA even when device="cpu" is provided
Fix check_env not checking if the env has a Dict actionspace before calling _check_nan (@wmmc88)
Update the check for spaces unsupported by Stable Baselines 3 to include checks on the action space (@wmmc88)
Fixed feature extractor bug for target network where the same net was shared instead
of being separate. This bug affects SAC, DDPG and TD3 when using CnnPolicy (or custom feature extractor)
Fixed a bug when passing an environment when loading a saved model with a CnnPolicy, the passed env was not wrapped properly
(the bug was introduced when implementing HER so it should not be present in previous versions)