Skip to content

Commit

Permalink
Implement Gymnasium-compliant PPO script (#320)
Browse files Browse the repository at this point in the history
* Add Gymnasium and dependencies

* Implement Gymnasium-compliant PPO script

* Ensure pre-commit passes

* Fix CI, add a `gymnasium_support` folder

* update lock files

* add dependencies

* update requirements.txt; fix pre-commit

* update poetry files

* Support dm control action spaces

* add dm_control support

* Enable num_envs>1

* Enable auto-install of torch based on CUDA version

* Fix pre-commit

* bump torch version

* bump wandb version

* change key for mujoco_py installation

* update CI

* update docs

* downgrade torch

* update docs

* update teset cases

* set default env = HalfCheetah-v4

* directly replace `ppo_continuous_action.py`

* deprecate pybullet dependency in ppo

* remove pybullet test case

* support video recording to wandb

* update docs

* update depdency for test cases

* update test cases and add dm_control tests

* update docs

* update mkdocs base

* revert doc changes

* fix dm_control test cases

* quick docs

* fix tests on CI

* fix test case

* fix CI

* Fix CI

* update mujoco dependency

* Fix CI

* fix CI

* remote unused seed

Co-authored-by: Daniel Tan <[email protected]>
Co-authored-by: Costa Huang <[email protected]>
  • Loading branch information
3 people authored Dec 13, 2022
1 parent cb2b746 commit b558b2b
Show file tree
Hide file tree
Showing 37 changed files with 864 additions and 191 deletions.
5 changes: 1 addition & 4 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
name: pre-commit

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
push
jobs:
build:
runs-on: ubuntu-latest
Expand Down
79 changes: 62 additions & 17 deletions .github/workflows/tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,6 @@ on:
- '**/README.md'
- 'docs/**/*'
- 'cloud/**/*'
pull_request:
paths-ignore:
- '**/README.md'
- 'docs/**/*'
- 'cloud/**/*'
jobs:
test-core-envs:
strategy:
Expand Down Expand Up @@ -133,7 +128,6 @@ jobs:
- name: Run pybullet tests
run: poetry run pytest tests/test_procgen.py


test-mujoco-envs:
strategy:
fail-fast: false
Expand All @@ -153,25 +147,76 @@ jobs:
poetry-version: ${{ matrix.poetry-version }}

# mujoco tests
- name: Install core dependencies
run: poetry install --with pytest
- name: Install pybullet dependencies
run: poetry install --with pybullet
- name: Install mujoco dependencies
run: poetry install --with mujoco
- name: Install jax dependencies
run: poetry install --with jax
- name: Install dependencies
run: poetry install --with pytest,mujoco,dm_control
- name: Downgrade setuptools
run: poetry run pip install setuptools==59.5.0
- name: install mujoco dependencies
run: |
sudo apt-get update && sudo apt-get -y install libgl1-mesa-glx libosmesa6 libglfw3
- name: Run mujoco tests
continue-on-error: true # MUJOCO_GL=osmesa results in `free(): invalid pointer`
run: poetry run pytest tests/test_mujoco.py

test-mujoco-envs-windows-mac:
strategy:
fail-fast: false
matrix:
python-version: [3.8]
poetry-version: [1.2]
os: [macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Run image
uses: abatilo/[email protected]
with:
poetry-version: ${{ matrix.poetry-version }}

# mujoco tests
- name: Install dependencies
run: poetry install --with pytest,mujoco,dm_control
- name: Downgrade setuptools
run: poetry run pip install setuptools==59.5.0
- name: Run mujoco tests
run: poetry run pytest tests/test_mujoco.py


test-mujoco_py-envs:
strategy:
fail-fast: false
matrix:
python-version: [3.8]
poetry-version: [1.2]
os: [ubuntu-22.04]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Run image
uses: abatilo/[email protected]
with:
poetry-version: ${{ matrix.poetry-version }}

# mujoco_py tests
- name: Install dependencies
run: poetry install --with pytest,pybullet,mujoco_py,mujoco,jax
- name: Downgrade setuptools
run: poetry run pip install setuptools==59.5.0
- name: install mujoco_py dependencies
run: |
sudo apt-get update && sudo apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
libglew-dev \
libosmesa6-dev patchelf
- name: Run mujoco tests
run: poetry run pytest tests/test_mujoco.py
- name: Run mujoco_py tests
run: poetry run pytest tests/test_mujoco_py.py

test-envpool-envs:
strategy:
Expand Down Expand Up @@ -251,4 +296,4 @@ jobs:
- name: Install ROMs
run: poetry run AutoROM --accept-license
- name: Run pettingzoo tests
run: poetry run pytest tests/test_pettingzoo_ma_atari.py
run: poetry run pytest tests/test_pettingzoo_ma_atari.py
2 changes: 1 addition & 1 deletion .gitpod.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ RUN mkdir cleanrl_utils && touch cleanrl_utils/__init__.py
RUN pip install poetry --upgrade
RUN poetry config virtualenvs.in-project true

# install mujoco
# install mujoco_py
RUN sudo apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@ repos:
args: ["--without-hashes", "-o", "requirements/requirements-pybullet.txt", "--with", "pybullet"]
stages: [manual]
- id: poetry-export
name: poetry-export requirements-mujoco.txt
args: ["--without-hashes", "-o", "requirements/requirements-mujoco.txt", "--with", "mujoco"]
name: poetry-export requirements-mujoco_py.txt
args: ["--without-hashes", "-o", "requirements/requirements-mujoco_py.txt", "--with", "mujoco_py"]
stages: [manual]
- id: poetry-export
name: poetry-export requirements-procgen.txt
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ RUN poetry install
RUN poetry install --with atari
RUN poetry install --with pybullet

# install mujoco
# install mujoco_py
RUN apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
libglew-dev \
libosmesa6-dev patchelf
RUN poetry install --with mujoco
RUN poetry install --with mujoco_py
RUN poetry run python -c "import mujoco_py"

COPY entrypoint.sh /usr/local/bin/
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,6 @@ You may also use a prebuilt development environment hosted in Gitpod:

## Algorithms Implemented

# Overview

| Algorithm | Variants Implemented |
| ----------- | ----------- |
Expand Down
4 changes: 2 additions & 2 deletions benchmark/ddpg.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
poetry install --with mujoco,pybullet
poetry install --with mujoco_py,pybullet
python -c "import mujoco_py"
xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--command "poetry run python cleanrl/ddpg_continuous_action.py --track --capture-video" \
--num-seeds 3 \
--workers 1

poetry install --with mujoco,jax
poetry install --with mujoco_py,jax
poetry run pip install --upgrade "jax[cuda]==0.3.17" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
poetry run python -c "import mujoco_py"
xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
Expand Down
22 changes: 19 additions & 3 deletions benchmark/ppo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@ xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--num-seeds 3 \
--workers 1

poetry install --with mujoco,pybullet
poetry install --with mujoco_py,mujoco
poetry run python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--command "poetry run python cleanrl/ppo_continuous_action.py --cuda False --track --capture-video" \
--num-seeds 3 \
--workers 9
--workers 6

poetry install --with procgen
xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
Expand Down Expand Up @@ -89,3 +89,19 @@ poetry run python -m cleanrl_utils.benchmark \
--command "poetry run python ppo_atari_envpool_xla_jax.py --track --wandb-project-name envpool-atari --wandb-entity openrlbenchmark" \
--num-seeds 3 \
--workers 1

# gymnasium support
poetry install --with mujoco
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
--command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
--num-seeds 3 \
--workers 1

poetry install --with dm_control,mujoco
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--env-ids dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0 dm_control/ball_in_cup-catch-v0 dm_control/cartpole-balance-v0 dm_control/cartpole-balance_sparse-v0 dm_control/cartpole-swingup-v0 dm_control/cartpole-swingup_sparse-v0 dm_control/cartpole-two_poles-v0 dm_control/cartpole-three_poles-v0 dm_control/cheetah-run-v0 dm_control/dog-stand-v0 dm_control/dog-walk-v0 dm_control/dog-trot-v0 dm_control/dog-run-v0 dm_control/dog-fetch-v0 dm_control/finger-spin-v0 dm_control/finger-turn_easy-v0 dm_control/finger-turn_hard-v0 dm_control/fish-upright-v0 dm_control/fish-swim-v0 dm_control/hopper-stand-v0 dm_control/hopper-hop-v0 dm_control/humanoid-stand-v0 dm_control/humanoid-walk-v0 dm_control/humanoid-run-v0 dm_control/humanoid-run_pure_state-v0 dm_control/humanoid_CMU-stand-v0 dm_control/humanoid_CMU-run-v0 dm_control/lqr-lqr_2_1-v0 dm_control/lqr-lqr_6_2-v0 dm_control/manipulator-bring_ball-v0 dm_control/manipulator-bring_peg-v0 dm_control/manipulator-insert_ball-v0 dm_control/manipulator-insert_peg-v0 dm_control/pendulum-swingup-v0 dm_control/point_mass-easy-v0 dm_control/point_mass-hard-v0 dm_control/quadruped-walk-v0 dm_control/quadruped-run-v0 dm_control/quadruped-escape-v0 dm_control/quadruped-fetch-v0 dm_control/reacher-easy-v0 dm_control/reacher-hard-v0 dm_control/stacker-stack_2-v0 dm_control/stacker-stack_4-v0 dm_control/swimmer-swimmer6-v0 dm_control/swimmer-swimmer15-v0 dm_control/walker-stand-v0 dm_control/walker-walk-v0 dm_control/walker-run-v0 \
--command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
--num-seeds 3 \
--workers 9

2 changes: 1 addition & 1 deletion benchmark/sac.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
poetry install --with mujoco,pybullet
poetry install --with mujoco_py,pybullet
poetry run python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 \
Expand Down
4 changes: 2 additions & 2 deletions benchmark/td3.sh
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
poetry install --with mujoco,pybullet
poetry install --with mujoco_py,pybullet
python -c "import mujoco_py"
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--command "poetry run python cleanrl/td3_continuous_action.py --track --capture-video" \
--num-seeds 3 \
--workers 1

poetry install --with mujoco,jax
poetry install --with mujoco_py,jax
poetry run pip install --upgrade "jax[cuda]==0.3.17" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
poetry run python -c "import mujoco_py"
xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
Expand Down
50 changes: 32 additions & 18 deletions cleanrl/ppo_continuous_action.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,8 @@
import time
from distutils.util import strtobool

import gym
import gymnasium as gym
import numpy as np
import pybullet_envs # noqa
import torch
import torch.nn as nn
import torch.optim as optim
Expand Down Expand Up @@ -36,7 +35,7 @@ def parse_args():
help="whether to capture videos of the agent performances (check out `videos` folder)")

# Algorithm specific arguments
parser.add_argument("--env-id", type=str, default="HalfCheetahBulletEnv-v0",
parser.add_argument("--env-id", type=str, default="HalfCheetah-v4",
help="the id of the environment")
parser.add_argument("--total-timesteps", type=int, default=1000000,
help="total timesteps of the experiments")
Expand Down Expand Up @@ -77,9 +76,13 @@ def parse_args():
return args


def make_env(env_id, seed, idx, capture_video, run_name, gamma):
def make_env(env_id, idx, capture_video, run_name, gamma):
def thunk():
env = gym.make(env_id)
if capture_video:
env = gym.make(env_id, render_mode="rgb_array")
else:
env = gym.make(env_id)
env = gym.wrappers.FlattenObservation(env) # deal with dm_control's Dict observation space
env = gym.wrappers.RecordEpisodeStatistics(env)
if capture_video:
if idx == 0:
Expand All @@ -89,9 +92,6 @@ def thunk():
env = gym.wrappers.TransformObservation(env, lambda obs: np.clip(obs, -10, 10))
env = gym.wrappers.NormalizeReward(env, gamma=gamma)
env = gym.wrappers.TransformReward(env, lambda reward: np.clip(reward, -10, 10))
env.seed(seed)
env.action_space.seed(seed)
env.observation_space.seed(seed)
return env

return thunk
Expand Down Expand Up @@ -147,7 +147,7 @@ def get_action_and_value(self, x, action=None):
sync_tensorboard=True,
config=vars(args),
name=run_name,
monitor_gym=True,
# monitor_gym=True, no longer works for gymnasium
save_code=True,
)
writer = SummaryWriter(f"runs/{run_name}")
Expand All @@ -166,7 +166,7 @@ def get_action_and_value(self, x, action=None):

# env setup
envs = gym.vector.SyncVectorEnv(
[make_env(args.env_id, args.seed + i, i, args.capture_video, run_name, args.gamma) for i in range(args.num_envs)]
[make_env(args.env_id, i, args.capture_video, run_name, args.gamma) for i in range(args.num_envs)]
)
assert isinstance(envs.single_action_space, gym.spaces.Box), "only continuous action space is supported"

Expand All @@ -184,9 +184,11 @@ def get_action_and_value(self, x, action=None):
# TRY NOT TO MODIFY: start the game
global_step = 0
start_time = time.time()
next_obs = torch.Tensor(envs.reset()).to(device)
next_obs, _ = envs.reset(seed=args.seed)
next_obs = torch.Tensor(next_obs).to(device)
next_done = torch.zeros(args.num_envs).to(device)
num_updates = args.total_timesteps // args.batch_size
video_filenames = set()

for update in range(1, num_updates + 1):
# Annealing the rate if instructed to do so.
Expand All @@ -208,16 +210,22 @@ def get_action_and_value(self, x, action=None):
logprobs[step] = logprob

# TRY NOT TO MODIFY: execute the game and log data.
next_obs, reward, done, info = envs.step(action.cpu().numpy())
next_obs, reward, terminated, truncated, infos = envs.step(action.cpu().numpy())
done = np.logical_or(terminated, truncated)
rewards[step] = torch.tensor(reward).to(device).view(-1)
next_obs, next_done = torch.Tensor(next_obs).to(device), torch.Tensor(done).to(device)

for item in info:
if "episode" in item.keys():
print(f"global_step={global_step}, episodic_return={item['episode']['r']}")
writer.add_scalar("charts/episodic_return", item["episode"]["r"], global_step)
writer.add_scalar("charts/episodic_length", item["episode"]["l"], global_step)
break
# Only print when at least 1 env is done
if "final_info" not in infos:
continue

for info in infos["final_info"]:
# Skip the envs that are not done
if info is None:
continue
print(f"global_step={global_step}, episodic_return={info['episode']['r']}")
writer.add_scalar("charts/episodic_return", info["episode"]["r"], global_step)
writer.add_scalar("charts/episodic_length", info["episode"]["l"], global_step)

# bootstrap value if not done
with torch.no_grad():
Expand Down Expand Up @@ -314,5 +322,11 @@ def get_action_and_value(self, x, action=None):
print("SPS:", int(global_step / (time.time() - start_time)))
writer.add_scalar("charts/SPS", int(global_step / (time.time() - start_time)), global_step)

if args.track and args.capture_video:
for filename in os.listdir(f"videos/{run_name}"):
if filename not in video_filenames and filename.endswith(".mp4"):
wandb.log({f"videos": wandb.Video(f"videos/{run_name}/{filename}")})
video_filenames.add(filename)

envs.close()
writer.close()
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ ninja = "^1.10.2"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
build-backend = "poetry.core.masonry.api"
4 changes: 2 additions & 2 deletions docs/get-started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ You can install them using the following command
```bash
poetry install --with atari
poetry install --with pybullet
poetry install --with mujoco
poetry install --with mujoco_py
poetry install --with procgen
poetry install --with envpool
poetry install --with pettingzoo
Expand All @@ -94,7 +94,7 @@ While we recommend using `poetry` to manage environments and dependencies, the t
pip install -r requirements/requirements.txt
pip install -r requirements/requirements-atari.txt
pip install -r requirements/requirements-pybullet.txt
pip install -r requirements/requirements-mujoco.txt
pip install -r requirements/requirements-mujoco_py.txt
pip install -r requirements/requirements-procgen.txt
pip install -r requirements/requirements-envpool.txt
pip install -r requirements/requirements-pettingzoo.txt
Expand Down
Loading

1 comment on commit b558b2b

@vercel
Copy link

@vercel vercel bot commented on b558b2b Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Successfully deployed to the following URLs:

cleanrl – ./

cleanrl-git-master-vwxyzjn.vercel.app
cleanrl-vwxyzjn.vercel.app
cleanrl.vercel.app
docs.cleanrl.dev

Please sign in to comment.