Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.
- Implement Algorithms
- Vanilla Policy Gradient (VPG)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed DDPG (TD3)
- Use Python standard logging library
- Support TensorBoard
You can check the benchmark result here.
This benchmark is conducted based on the Benchmarks for Spinning Up Implementations.
All experiments were run for 3 random seeds each. All the details such as tensorboard and experiment logs, training scripts and trained models are stored in the benchmarks folder.
Here is the code of training PPO on CartPole-v1 environment. You can run with this Google Colab notebook.
import gymnasium as gym
import torch
import torch.nn as nn
from rl_replicas.algorithms import PPO
from rl_replicas.networks import MLP
from rl_replicas.policies import CategoricalPolicy
from rl_replicas.samplers import BatchSampler
from rl_replicas.value_function import ValueFunction
env_name = "CartPole-v1"
output_dir = "/content/ppo"
num_epochs = 80
seed = 0
network_hidden_sizes = [64, 64]
policy_learning_rate = 3e-4
value_function_learning_rate = 1e-3
env = gym.make(env_name)
env.action_space.seed(seed)
observation_size: int = env.observation_space.shape[0]
action_size: int = env.action_space.n
policy_network: nn.Module = MLP(
sizes=[observation_size] + network_hidden_sizes + [action_size]
)
value_function_network: nn.Module = MLP(
sizes=[observation_size] + network_hidden_sizes + [1]
)
model: PPO = PPO(
CategoricalPolicy(
network=policy_network,
optimizer=torch.optim.Adam(policy_network.parameters(), lr=3e-4),
),
ValueFunction(
network=value_function_network,
optimizer=torch.optim.Adam(value_function_network.parameters(), lr=1e-3),
),
env,
BatchSampler(env, seed),
)
model.learn(num_epochs=num_epochs, output_dir=output_dir)
All contributions are welcome.
- Create a release branch.
- A pull request from the release branch to the
main
branch has the following:- Change logs in the body.
- The
release
label. - Commit that bumps up the version in
VERSION
.
- Once the pull request is ready, merge the pull request. The CI will upload the package and create the release.