Reinforcement Learning Replications

Reinforcement Learning Replications is a set of Pytorch implementations of reinforcement learning algorithms.

Features

Implement Algorithms
- Vanilla Policy Gradient (VPG)
- Trust Region Policy Optimization (TRPO)
- Proximal Policy Optimization (PPO)
- Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed DDPG (TD3)
Use Python standard logging library
Support TensorBoard

Benchmarks

You can check the benchmark result here.

This benchmark is conducted based on the Benchmarks for Spinning Up Implementations.

All experiments were run for 3 random seeds each. All the details such as tensorboard and experiment logs, training scripts and trained models are stored in the benchmarks folder.

Example Code

Here is the code of training PPO on CartPole-v1 environment. You can run with this Google Colab notebook.

import gymnasium as gym
import torch
import torch.nn as nn

from rl_replicas.algorithms import PPO
from rl_replicas.networks import MLP
from rl_replicas.policies import CategoricalPolicy
from rl_replicas.samplers import BatchSampler
from rl_replicas.value_function import ValueFunction

env_name = "CartPole-v1"
output_dir = "/content/ppo"
num_epochs = 80
seed = 0

network_hidden_sizes = [64, 64]
policy_learning_rate = 3e-4
value_function_learning_rate = 1e-3

env = gym.make(env_name)
env.action_space.seed(seed)

observation_size: int = env.observation_space.shape[0]
action_size: int = env.action_space.n

policy_network: nn.Module = MLP(
    sizes=[observation_size] + network_hidden_sizes + [action_size]
)

value_function_network: nn.Module = MLP(
    sizes=[observation_size] + network_hidden_sizes + [1]
)

model: PPO = PPO(
    CategoricalPolicy(
        network=policy_network,
        optimizer=torch.optim.Adam(policy_network.parameters(), lr=3e-4),
    ),
    ValueFunction(
        network=value_function_network,
        optimizer=torch.optim.Adam(value_function_network.parameters(), lr=1e-3),
    ),
    env,
    BatchSampler(env, seed),
)

model.learn(num_epochs=num_epochs, output_dir=output_dir)

Contributing

All contributions are welcome.

Release Flow

Create a release branch.
A pull request from the release branch to the main branch has the following:
- Change logs in the body.
- The release label.
- Commit that bumps up the version in VERSION.
Once the pull request is ready, merge the pull request. The CI will upload the package and create the release.

Name		Name	Last commit message	Last commit date
Latest commit History 404 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
src/rl_replicas		src/rl_replicas
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
VERSION		VERSION
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning Replications

Features

Benchmarks

Example Code

Contributing

Release Flow

About

Releases 5

Packages

Languages

License

yamatokataoka/reinforcement-learning-replications

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Replications

Features

Benchmarks

Example Code

Contributing

Release Flow

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages