Skip to content

Deep Reinforcement Learning: On-Policy Actor Critic methods. An implementation of Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) on the PyTorch Lightning framework.

Notifications You must be signed in to change notification settings

RFLeijenaar/RL-On-Policy-Actor-Critic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On-Policy Actor-Critic methods

An implementation of the following on-policy actor-critic methods: Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO). The implementation is based on the following papers:

Examples of Trained Agents

InvertedDoublePendulum-v4 (PPO ~ 300k frames)

InvertedDoublePendulum-v4-300k.mp4

HalfCheetah-v4 (PPO ~ 200k frames)

HalfCheetah-v4-200k.mp4

Running the code

Installation

To install all dependencies, run the following command:

pip install -r requirements.txt

Training

To train the agent, run the following command:

python source/train_agent.py [Training Options] [PPO Options]

Training Options:

  • --run_name (str): Name of the run.
  • --algorithm ({A2C,PPO}) : Type of algorithm to use for training.
  • --env_id (str): Id of the environment to train on.

  • --perform_testing: Whether to perform testing after training.
  • --log_video: Whether to log video of agent's performance.

  • --max_epochs (int) (default: 3): Maximum number of steps to train for.
  • --steps_per_epoch (int): Number of steps to train for per epoch.
  • --num_envs (int): Number of environments to train on.
  • --num_rollout_steps (int): Number of steps to rollout policy for.

  • --optimizer ({Adam,RMSprop,SGD}): Optimizer to use for training.
  • --learning_rate (float): Learning rate for training.
  • --lr_decay (float): Learning rate decay for training.
  • --weight_decay (float): Weight decay (L2 regularization) for training.
  • --gamma (float): Discount factor.
  • --gae_lambda (float): Lambda parameter for Generalized Advantage Estimation (GAE).
  • --value_coef (float): Coefficient for value loss.
  • --entropy_coef (float): Coefficient for entropy loss.
  • --max_grad_norm (float): Maximum gradient norm for clipping.

  • --init_std (float): Initial standard deviation for policy.
  • --hidden_size (int): Hidden size for policy.
  • --shared_extractor: Whether to use a shared feature extractor for policy.

PPO Options:

  • --ppo_batch_size (int): Batch size for Proximal Policy Optimization (PPO).
  • --ppo_epochs (int): Number of epochs to train PPO for.
  • --ppo_clip_ratio (float): Clip ratio for PPO.
  • --ppo_clip_anneal: Whether to anneal the clip ratio for PPO.

About

Deep Reinforcement Learning: On-Policy Actor Critic methods. An implementation of Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) on the PyTorch Lightning framework.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published