An implementation of the following on-policy actor-critic methods: Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO). The implementation is based on the following papers:
InvertedDoublePendulum-v4-300k.mp4
HalfCheetah-v4-200k.mp4
To install all dependencies, run the following command:
pip install -r requirements.txt
To train the agent, run the following command:
python source/train_agent.py [Training Options] [PPO Options]
- --run_name (str): Name of the run.
- --algorithm ({A2C,PPO}) : Type of algorithm to use for training.
- --env_id (str): Id of the environment to train on.
- --perform_testing: Whether to perform testing after training.
- --log_video: Whether to log video of agent's performance.
- --max_epochs (int) (default: 3): Maximum number of steps to train for.
- --steps_per_epoch (int): Number of steps to train for per epoch.
- --num_envs (int): Number of environments to train on.
- --num_rollout_steps (int): Number of steps to rollout policy for.
- --optimizer ({Adam,RMSprop,SGD}): Optimizer to use for training.
- --learning_rate (float): Learning rate for training.
- --lr_decay (float): Learning rate decay for training.
- --weight_decay (float): Weight decay (L2 regularization) for training.
- --gamma (float): Discount factor.
- --gae_lambda (float): Lambda parameter for Generalized Advantage Estimation (GAE).
- --value_coef (float): Coefficient for value loss.
- --entropy_coef (float): Coefficient for entropy loss.
- --max_grad_norm (float): Maximum gradient norm for clipping.
- --init_std (float): Initial standard deviation for policy.
- --hidden_size (int): Hidden size for policy.
- --shared_extractor: Whether to use a shared feature extractor for policy.
- --ppo_batch_size (int): Batch size for Proximal Policy Optimization (PPO).
- --ppo_epochs (int): Number of epochs to train PPO for.
- --ppo_clip_ratio (float): Clip ratio for PPO.
- --ppo_clip_anneal: Whether to anneal the clip ratio for PPO.