Skip to content

Commit

Permalink
Documentation update (#1732)
Browse files Browse the repository at this point in the history
* Update RL Tips

* Fix grammar

* Update SBX doc

* Fix various typos and grammar mistakes
  • Loading branch information
araffin authored Nov 3, 2023
1 parent 69afefc commit 294f2b4
Show file tree
Hide file tree
Showing 17 changed files with 76 additions and 55 deletions.
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ into two categories:
2. You want to implement a feature or bug-fix for an outstanding issue
- Look at the outstanding issues here: https://github.com/DLR-RM/stable-baselines3/issues
- Pick an issue or feature and comment on the task that you want to work on this feature.
- If you need more context on a particular issue, please ask and we shall provide.
- If you need more context on a particular issue, please ask, and we shall provide.

Once you finish implementing a feature or bug-fix, please send a Pull Request to
https://github.com/DLR-RM/stable-baselines3
Expand Down Expand Up @@ -61,7 +61,7 @@ def my_function(arg1: type1, arg2: type2) -> returntype:

## Pull Request (PR)

Before proposing a PR, please open an issue, where the feature will be discussed. This prevent from duplicated PR to be proposed and also ease the code review process.
Before proposing a PR, please open an issue, where the feature will be discussed. This prevents from duplicated PR to be proposed and also ease the code review process.

Each PR need to be reviewed and accepted by at least one of the maintainers (@hill-a, @araffin, @ernestum, @AdamGleave, @Miffyli or @qgallouedec).
A PR must pass the Continuous Integration tests to be merged with the master branch.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ pip install stable-baselines3[extra]
```
**Note:** Some shells such as Zsh require quotation marks around brackets, i.e. `pip install 'stable-baselines3[extra]'` ([More Info](https://stackoverflow.com/a/30539963)).

This includes an optional dependencies like Tensorboard, OpenCV or `atari-py` to train on atari games. If you do not need those, you can use:
This includes an optional dependencies like Tensorboard, OpenCV or `ale-py` to train on atari games. If you do not need those, you can use:
```sh
pip install stable-baselines3
```
Expand Down
2 changes: 1 addition & 1 deletion docs/common/distributions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The policy networks output parameters for the distributions (named ``flat`` in t
Actions are then sampled from those distributions.

For instance, in the case of discrete actions. The policy network outputs probability
of taking each action. The ``CategoricalDistribution`` allows to sample from it,
of taking each action. The ``CategoricalDistribution`` allows sampling from it,
computes the entropy, the log probability (``log_prob``) and backpropagate the gradient.

In the case of continuous actions, a Gaussian distribution is used. The policy network outputs
Expand Down
16 changes: 8 additions & 8 deletions docs/guide/callbacks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ You can find two examples of custom callbacks in the documentation: one for savi
:param verbose: Verbosity level: 0 for no output, 1 for info messages, 2 for debug messages
"""
def __init__(self, verbose=0):
super(CustomCallback, self).__init__(verbose)
super().__init__(verbose)
# Those variables will be accessible in the callback
# (they are defined in the base class)
# The RL model
Expand Down Expand Up @@ -70,7 +70,7 @@ You can find two examples of custom callbacks in the documentation: one for savi
For child callback (of an `EventCallback`), this will be called
when the event is triggered.
:return: (bool) If the callback returns False, training is aborted early.
:return: If the callback returns False, training is aborted early.
"""
return True
Expand Down Expand Up @@ -110,7 +110,7 @@ A child callback is for instance :ref:`StopTrainingOnRewardThreshold <StopTraini

.. note::

We recommend to take a look at the source code of :ref:`EvalCallback` and :ref:`StopTrainingOnRewardThreshold <StopTrainingCallback>` to have a better overview of what can be achieved with this kind of callbacks.
We recommend taking a look at the source code of :ref:`EvalCallback` and :ref:`StopTrainingOnRewardThreshold <StopTrainingCallback>` to have a better overview of what can be achieved with this kind of callbacks.


.. code-block:: python
Expand Down Expand Up @@ -159,8 +159,8 @@ corresponding statistics using ``save_vecnormalize`` (``False`` by default).

.. warning::

When using multiple environments, each call to ``env.step()`` will effectively correspond to ``n_envs`` steps.
If you want the ``save_freq`` to be similar when using different number of environments,
When using multiple environments, each call to ``env.step()`` will effectively correspond to ``n_envs`` steps.
If you want the ``save_freq`` to be similar when using a different number of environments,
you need to account for it using ``save_freq = max(save_freq // n_envs, 1)``.
The same goes for the other callbacks.

Expand Down Expand Up @@ -189,7 +189,7 @@ EvalCallback
^^^^^^^^^^^^

Evaluate periodically the performance of an agent, using a separate test environment.
It will save the best model if ``best_model_save_path`` folder is specified and save the evaluations results in a numpy archive (``evaluations.npz``) if ``log_path`` folder is specified.
It will save the best model if ``best_model_save_path`` folder is specified and save the evaluations results in a NumPy archive (``evaluations.npz``) if ``log_path`` folder is specified.


.. note::
Expand Down Expand Up @@ -230,7 +230,7 @@ This callback is integrated inside SB3 via the ``progress_bar`` argument of the

.. note::

This callback requires ``tqdm`` and ``rich`` packages to be installed. This is done automatically when using ``pip install stable-baselines3[extra]``
``ProgressBarCallback`` callback requires ``tqdm`` and ``rich`` packages to be installed. This is done automatically when using ``pip install stable-baselines3[extra]``


.. code-block:: python
Expand Down Expand Up @@ -367,7 +367,7 @@ StopTrainingOnNoModelImprovement
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Stop the training if there is no new best model (no new best mean reward) after more than a specific number of consecutive evaluations.
The idea is to save time in experiments when you know that the learning curves are somehow well behaved and, therefore,
The idea is to save time in experiments when you know that the learning curves are somehow well-behaved and, therefore,
after many evaluations without improvement the learning has probably stabilized.
It must be used with the :ref:`EvalCallback` and use the event triggered after every evaluation.

Expand Down
2 changes: 1 addition & 1 deletion docs/guide/custom_env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Using Custom Environments
==========================

To use the RL baselines with custom environments, they just need to follow the *gymnasium* `interface <https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/#sphx-glr-tutorials-gymnasium-basics-environment-creation-py>`_.
To use the RL baselines with custom environments, they just need to follow the *gymnasium* `interface <https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/#sphx-glr-tutorials-gymnasium-basics-environment-creation-py>`_.
That is to say, your environment must implement the following methods (and inherits from Gym Class):


Expand Down
2 changes: 1 addition & 1 deletion docs/guide/custom_policy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ Custom Networks
If you need a network architecture that is different for the actor and the critic when using ``PPO``, ``A2C`` or ``TRPO``,
you can pass a dictionary of the following structure: ``dict(pi=[<actor network architecture>], vf=[<critic network architecture>])``.

For example, if you want a different architecture for the actor (aka ``pi``) and the critic ( value-function aka ``vf``) networks,
For example, if you want a different architecture for the actor (aka ``pi``) and the critic (value-function aka ``vf``) networks,
then you can specify ``net_arch=dict(pi=[32, 32], vf=[64, 64])``.

Otherwise, to have actor and critic that share the same network architecture,
Expand Down
10 changes: 5 additions & 5 deletions docs/guide/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Examples

.. note::

These examples are only to demonstrate the use of the library and its functions, and the trained agents may not solve the environments. Optimized hyperparameters can be found in the RL Zoo `repository <https://github.com/DLR-RM/rl-baselines3-zoo>`_.
These examples are only to demonstrate the use of the library and its functions, and the trained agents may not solve the environments. Optimized hyperparameters can be found in the RL Zoo `repository <https://github.com/DLR-RM/rl-baselines3-zoo>`_.


Try it online with Colab Notebooks!
Expand Down Expand Up @@ -191,8 +191,8 @@ Dict Observations

You can use environments with dictionary observation spaces. This is useful in the case where one can't directly
concatenate observations such as an image from a camera combined with a vector of servo sensor data (e.g., rotation angles).
Stable Baselines3 provides ``SimpleMultiObsEnv`` as an example of this kind of of setting.
The environment is a simple grid world but the observations for each cell come in the form of dictionaries.
Stable Baselines3 provides ``SimpleMultiObsEnv`` as an example of this kind of setting.
The environment is a simple grid world, but the observations for each cell come in the form of dictionaries.
These dictionaries are randomly initialized on the creation of the environment and contain a vector observation and an image observation.

.. code-block:: python
Expand All @@ -217,7 +217,7 @@ Callbacks: Monitoring Training

You can define a custom callback function that will be called inside the agent.
This could be useful when you want to monitor training, for instance display live
learning curves in Tensorboard (or in Visdom) or save the best agent.
learning curves in Tensorboard or save the best agent.
If your callback returns False, training is aborted early.

.. image:: ../_static/img/colab-badge.svg
Expand Down Expand Up @@ -251,7 +251,7 @@ If your callback returns False, training is aborted early.
:param verbose: Verbosity level: 0 for no output, 1 for info messages, 2 for debug messages
"""
def __init__(self, check_freq: int, log_dir: str, verbose: int = 1):
super(SaveOnBestTrainingRewardCallback, self).__init__(verbose)
super().__init__(verbose)
self.check_freq = check_freq
self.log_dir = log_dir
self.save_path = os.path.join(log_dir, "best_model")
Expand Down
4 changes: 2 additions & 2 deletions docs/guide/export.rst
Original file line number Diff line number Diff line change
Expand Up @@ -194,14 +194,14 @@ Full example code: https://github.com/chunky/sb3_to_coral

Google created a chip called the "Coral" for deploying AI to the
edge. It's available in a variety of form factors, including USB (using
the Coral on a Rasbperry pi, with a SB3-developed model, was the original
the Coral on a Raspberry Pi, with a SB3-developed model, was the original
motivation for the code example above).

The Coral chip is fast, with very low power consumption, but only has limited
on-device training abilities. More information is on the webpage here:
https://coral.ai.

To deploy to a Coral, one must work via TFLite, and quantise the
To deploy to a Coral, one must work via TFLite, and quantize the
network to reflect the Coral's capabilities. The full chain to go from
SB3 to Coral is: SB3 (Torch) => ONNX => TensorFlow => TFLite => Coral.

Expand Down
8 changes: 4 additions & 4 deletions docs/guide/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ Prerequisites

Stable-Baselines3 requires python 3.8+ and PyTorch >= 1.13

Windows 10
~~~~~~~~~~
Windows
~~~~~~~

We recommend using `Anaconda <https://conda.io/docs/user-guide/install/windows.html>`_ for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.6 or above.
We recommend using `Anaconda <https://conda.io/docs/user-guide/install/windows.html>`_ for Windows users for easier installation of Python packages and required libraries. You need an environment with Python version 3.8 or above.

For a quick start you can move straight to installing Stable-Baselines3 in the next step.

Expand All @@ -34,7 +34,7 @@ To install Stable Baselines3 with pip, execute:
Some shells such as Zsh require quotation marks around brackets, i.e. ``pip install 'stable-baselines3[extra]'`` `More information <https://stackoverflow.com/a/30539963>`_.


This includes an optional dependencies like Tensorboard, OpenCV or ``ale-py`` to train on atari games. If you do not need those, you can use:
This includes an optional dependencies like Tensorboard, OpenCV or ``ale-py`` to train on Atari games. If you do not need those, you can use:

.. code-block:: bash
Expand Down
6 changes: 3 additions & 3 deletions docs/guide/migration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Overview

Overall Stable-Baselines3 (SB3) keeps the high-level API of Stable-Baselines (SB2).
Most of the changes are to ensure more consistency and are internal ones.
Because of the backend change, from Tensorflow to PyTorch, the internal code is much much readable and easy to debug
Because of the backend change, from Tensorflow to PyTorch, the internal code is much more readable and easy to debug
at the cost of some speed (dynamic graph vs static graph., see `Issue #90 <https://github.com/DLR-RM/stable-baselines3/issues/90>`_)
However, the algorithms were extensively benchmarked on Atari games and continuous control PyBullet envs
(see `Issue #48 <https://github.com/DLR-RM/stable-baselines3/issues/48>`_ and `Issue #49 <https://github.com/DLR-RM/stable-baselines3/issues/49>`_)
Expand Down Expand Up @@ -203,8 +203,8 @@ New Features (SB3 vs SB2)
- Much cleaner and consistent base code (and no more warnings =D!) and static type checks
- Independent saving/loading/predict for policies
- A2C now supports Generalized Advantage Estimation (GAE) and advantage normalization (both are deactivated by default)
- Generalized State-Dependent Exploration (gSDE) exploration is available for A2C/PPO/SAC. It allows to use RL directly on real robots (cf https://arxiv.org/abs/2005.05719)
- Better saving/loading: optimizers are now included in the saved parameters and there is two new methods ``save_replay_buffer`` and ``load_replay_buffer`` for the replay buffer when using off-policy algorithms (DQN/DDPG/SAC/TD3)
- Generalized State-Dependent Exploration (gSDE) exploration is available for A2C/PPO/SAC. It allows using RL directly on real robots (cf https://arxiv.org/abs/2005.05719)
- Better saving/loading: optimizers are now included in the saved parameters and there are two new methods ``save_replay_buffer`` and ``load_replay_buffer`` for the replay buffer when using off-policy algorithms (DQN/DDPG/SAC/TD3)
- You can pass ``optimizer_class`` and ``optimizer_kwargs`` to ``policy_kwargs`` in order to easily
customize optimizers
- Seeding now works properly to have deterministic results
Expand Down
1 change: 1 addition & 0 deletions docs/guide/rl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ However, if you want to learn about RL, there are several good resources to get
- `Lilian Weng's blog <https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html>`_
- `Berkeley's Deep RL Bootcamp <https://sites.google.com/view/deep-rl-bootcamp/lectures>`_
- `Berkeley's Deep Reinforcement Learning course <http://rail.eecs.berkeley.edu/deeprlcourse/>`_
- `DQN tutorial <https://github.com/araffin/rlss23-dqn-tutorial>`_
- `More resources <https://github.com/dennybritz/reinforcement-learning>`_
Loading

0 comments on commit 294f2b4

Please sign in to comment.