On-Policy Actor-Critic methods

An implementation of the following on-policy actor-critic methods: Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO). The implementation is based on the following papers:

Examples of Trained Agents

InvertedDoublePendulum-v4 (PPO ~ 300k frames)

InvertedDoublePendulum-v4-300k.mp4

HalfCheetah-v4 (PPO ~ 200k frames)

HalfCheetah-v4-200k.mp4

Running the code

Installation

To install all dependencies, run the following command:

pip install -r requirements.txt

Training

To train the agent, run the following command:

python source/train_agent.py [Training Options] [PPO Options]

Training Options:

--run_name (str): Name of the run.
--algorithm ({A2C,PPO}) : Type of algorithm to use for training.
--env_id (str): Id of the environment to train on.
--perform_testing: Whether to perform testing after training.
--log_video: Whether to log video of agent's performance.
--max_epochs (int) (default: 3): Maximum number of steps to train for.
--steps_per_epoch (int): Number of steps to train for per epoch.
--num_envs (int): Number of environments to train on.
--num_rollout_steps (int): Number of steps to rollout policy for.
--optimizer ({Adam,RMSprop,SGD}): Optimizer to use for training.
--learning_rate (float): Learning rate for training.
--lr_decay (float): Learning rate decay for training.
--weight_decay (float): Weight decay (L2 regularization) for training.
--gamma (float): Discount factor.
--gae_lambda (float): Lambda parameter for Generalized Advantage Estimation (GAE).
--value_coef (float): Coefficient for value loss.
--entropy_coef (float): Coefficient for entropy loss.
--max_grad_norm (float): Maximum gradient norm for clipping.
--init_std (float): Initial standard deviation for policy.
--hidden_size (int): Hidden size for policy.
--shared_extractor: Whether to use a shared feature extractor for policy.

PPO Options:

--ppo_batch_size (int): Batch size for Proximal Policy Optimization (PPO).
--ppo_epochs (int): Number of epochs to train PPO for.
--ppo_clip_ratio (float): Clip ratio for PPO.
--ppo_clip_anneal: Whether to anneal the clip ratio for PPO.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.vscode		.vscode
notebooks		notebooks
results		results
run_scripts		run_scripts
source		source
videos		videos
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On-Policy Actor-Critic methods

Examples of Trained Agents

InvertedDoublePendulum-v4 (PPO ~ 300k frames)

HalfCheetah-v4 (PPO ~ 200k frames)

Running the code

Installation

Training

Training Options:

PPO Options:

About

Releases

Packages

Languages

RFLeijenaar/RL-On-Policy-Actor-Critic

Folders and files

Latest commit

History

Repository files navigation

On-Policy Actor-Critic methods

Examples of Trained Agents

InvertedDoublePendulum-v4 (PPO ~ 300k frames)

HalfCheetah-v4 (PPO ~ 200k frames)

Running the code

Installation

Training

Training Options:

PPO Options:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages