Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SB3 tutorial (action masking, tests) #1017

Merged
merged 38 commits into from
Jul 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
cf2bef7
Update SB3 tutorial to have __main__ (error on macOS)
elliottower Jul 7, 2023
86916e6
Add SB3 tests for tutorials
elliottower Jul 7, 2023
f3e0cc9
Add action masking tutorial, fix typos/update documentation
elliottower Jul 7, 2023
d0784be
Add try catch for test sb3 action mask (pytest -v shouldn't require sb3)
elliottower Jul 7, 2023
87c46ef
Clean up documentation
elliottower Jul 8, 2023
baf59fa
Fix requirements.txt to specify pettingzoo[classic]
elliottower Jul 8, 2023
db8331a
Add try catch for render action mask
elliottower Jul 8, 2023
05b3dcc
Add try catch for render action mask
elliottower Jul 8, 2023
ecb96bf
Add try catch for other render files
elliottower Jul 8, 2023
d887db2
Fix code which doesn't work due to modules (tutorials not included)
elliottower Jul 8, 2023
085ed0a
Switch userwarnings to print statements and exit (so it doesn't fail)
elliottower Jul 8, 2023
18eca55
Add butterfly requirement to sb3 tutorial
elliottower Jul 8, 2023
429cbd8
Switch default timesteps to be more reasonable (10,000)
elliottower Jul 8, 2023
c9f0024
Switch default timesteps to be lower (2048), just so CI runs faster
elliottower Jul 8, 2023
a64022d
Switch num cpus to 2 by default (github ations only get 2 cores)
elliottower Jul 8, 2023
ee08317
Fix print statements logic
elliottower Jul 8, 2023
bd83f30
Update tutorials to evaluate, add KAZ example, test hyperparameters
elliottower Jul 9, 2023
0185af8
Update code to check more in depth statistics like winrate and total …
elliottower Jul 10, 2023
c977d89
Pre-commit
elliottower Jul 10, 2023
0625296
Un-comment training code for KAZ
elliottower Jul 10, 2023
459cc86
Update hyperparameters and fix pistonball crashing issue
elliottower Jul 10, 2023
9546c9c
Add hyperparameter notes
elliottower Jul 10, 2023
8cfd867
Add multiwalker tutorial for MLP example
elliottower Jul 10, 2023
41e26fc
Fix typo in docs
elliottower Jul 10, 2023
6af9e18
Polish up documentation and add sphinx warnings/notes
elliottower Jul 10, 2023
a454362
Try to fix missing module error from test file
elliottower Jul 10, 2023
5c75d4c
Update test_sb3_action_mask.py
elliottower Jul 10, 2023
fd23175
Add importorskip to each test, choose better hyperparameters
elliottower Jul 10, 2023
cefc86d
Move pytest importorskip calls
elliottower Jul 10, 2023
142b155
Disable most of the tests on test_sb3_action_mask.py
elliottower Jul 10, 2023
1a2d2ef
Split CI tests into separate actions (so they don't take 2 hours)
elliottower Jul 10, 2023
35addaa
Add separate requirements files for different sb3 tutorials
elliottower Jul 10, 2023
996274e
Fix workflow for tutorials to always install from root dir
elliottower Jul 10, 2023
c4834b5
Un-skip the rest of the action mask tests, as the longest one is pist…
elliottower Jul 10, 2023
1dfe96b
Remove pistonball env.close() line to avoid SuperSuit issue
elliottower Jul 10, 2023
5f65af0
Change multiwalker to waterworld (actually trains), remove pistonball…
elliottower Jul 11, 2023
7403e02
Add pymunk dependency to sisl waterworld (modulenotfound error)
elliottower Jul 11, 2023
a92b2a3
Add pymunk req
elliottower Jul 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/linux-tutorials-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
fail-fast: false
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10'] # '3.11' - broken due to numba
tutorial: ['Tianshou', 'EnvironmentCreation', 'CleanRL'] # TODO: add back 'CleanRL' after SuperSuit is fixed
tutorial: ['Tianshou', 'EnvironmentCreation', 'CleanRL', 'SB3/kaz', 'SB3/waterworld', 'SB3/connect_four', 'SB3/test'] # TODO: add back RLlib once it is fixed
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
Expand All @@ -29,8 +29,9 @@ jobs:
- name: Install dependencies and run tutorials
run: |
sudo apt-get install python3-opengl xvfb
export root_dir=$(pwd)
cd tutorials/${{ matrix.tutorial }}
pip install -r requirements.txt
pip uninstall -y pettingzoo
pip install -e ../..
pip install -e $root_dir
for f in *.py; do xvfb-run -a -s "-screen 0 1024x768x24" python "$f"; done
2 changes: 1 addition & 1 deletion docs/api/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ title: Parallel

# Parallel API

In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.
In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLlib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.

All parallel environments can be converted into AEC environments by splitting a simultaneous turn into sequential turns, with observations only from the previous cycle.

Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ tutorials/cleanrl/index
tutorials/tianshou/index
tutorials/rllib/index
tutorials/langchain/index
tutorials/sb3/index
```

```{toctree}
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/rllib/holdem.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ To follow this tutorial, you will need to install the dependencies shown below.
```

## Code
The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLLib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLlib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).

### Training the RL agent

Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/rllib/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
title: "RLlib"
---

# RLlib Tutorial
# Ray RLlib Tutorial

These tutorials show you how to use [RLlib](https://docs.ray.io/en/latest/rllib/index.html) to train agents in PettingZoo environments.
These tutorials show you how to use [Ray](https://docs.ray.io/en/latest/index.html)'s [RLlib](https://docs.ray.io/en/latest/rllib/index.html) library to train agents in PettingZoo environments.

* [PPO for Pistonball](/tutorials/rllib/pistonball/): _Train a PPO model in a parallel environment_

Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/rllib/pistonball.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ To follow this tutorial, you will need to install the dependencies shown below.
```

## Code
The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLLib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLlib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).

### Training the RL agent

Expand Down
54 changes: 54 additions & 0 deletions docs/tutorials/sb3/connect_four.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "SB3: Action Masked PPO for Connect Four"
---

# SB3: Action Masked PPO for Connect Four

This tutorial shows how to train a Maskable [Proximal Policy Optimization](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html) (PPO) model on the [Connect Four](https://pettingzoo.farama.org/environments/classic/chess/) environment ([AEC](https://pettingzoo.farama.org/api/aec/)).

It creates a custom Wrapper to convert to a [Gymnasium](https://gymnasium.farama.org/)-like environment which is compatible with [SB3 action masking](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html).


```{eval-rst}
.. note::

This environment has a discrete (1-dimensional) observation space with an illegal action mask, so we use a masked MLP feature extractor.
```

```{eval-rst}
.. warning::

This wrapper assumes that the action space and observation space is the same for each agent, this assumption may not hold for custom environments.
```

After training and evaluation, this script will launch a demo game using human rendering. Trained models are saved and loaded from disk (see SB3's [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/save_format.html) for more information).


## Environment Setup
To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
```{eval-rst}
.. literalinclude:: ../../../tutorials/SB3/connect_four/requirements.txt
:language: text
```

## Code
The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with SB3. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).

### Training and Evaluation

```{eval-rst}
.. literalinclude:: ../../../tutorials/SB3/connect_four/sb3_connect_four_action_mask.py
:language: python
```

### Testing other PettingZoo Classic environments

The following script uses [pytest](https://docs.pytest.org/en/latest/) to test all other PettingZoo environments which support action masking.

This code yields good results on simpler environments like [Gin Rummy](/environments/classic/gin_rummy/) and [Texas Hold’em No Limit](/environments/classic/texas_holdem_no_limit/), while failing to perform better than random in more difficult environments such as [Chess](/environments/classic/chess/) or [Hanabi](/environments/classic/hanabi/).


```{eval-rst}
.. literalinclude:: ../../../tutorials/SB3/test_sb3_action_mask.py
:language: python
```
39 changes: 30 additions & 9 deletions docs/tutorials/sb3/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,45 @@
title: "Stable-Baselines3"
---

# SB3 Tutorial
# Stable-Baselines3 Tutorial

These tutorials show you how to use [SB3](https://stable-baselines3.readthedocs.io/en/master/) to train agents in PettingZoo environments.
These tutorials show you how to use the [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) (SB3) library to train agents in PettingZoo environments.

* [PPO for Pistonball](/tutorials/sb3/pistonball/): _Train a PPO model in a parallel environment_
For environments with visual observations, we use a [CNN](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#stable_baselines3.ppo.CnnPolicy) policy and perform pre-processing steps such as frame-stacking, color reduction, and resizing using [SuperSuit](/api/wrappers/supersuit_wrappers/).

* [PPO for Rock-Paper-Scissors](/tutorials/sb3/rps/) _Train a PPO model in an AEC environment_
* [PPO for Knights-Archers-Zombies](/tutorials/sb3/kaz/) _Train agents using PPO in a vectorized environment with visual observations_

For non-visual environments, we use [MLP](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#stable_baselines3.ppo.MlpPolicy) policies and do not perform any pre-processing steps.

```{figure} https://docs.ray.io/en/latest/_images/rllib-stack.svg
:alt: RLlib stack
* [PPO for Waterworld](/tutorials/sb3/waterworld/): _Train agents using PPO in a vectorized environment with discrete observations_

* [Action Masked PPO for Connect Four](/tutorials/sb3/connect_four/): _Train agents using Action Masked PPO in an AEC environment_


## Stable-Baselines Overview

[Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) (SB3) is a library providing reliable implementations of reinforcement learning algorithms in [PyTorch](https://pytorch.org/). It provides a clean and simple interface, giving you access to off-the-shelf state-of-the-art model-free RL algorithms. It allows training of RL agents with only a few lines of code.

For more information, see the [Stable-Baselines3 v1.0 Blog Post](https://araffin.github.io/post/sb3/)


```{eval-rst}
.. warning::

Note: SB3 is designed for single-agent RL and does not plan on natively supporting multi-agent PettingZoo environments. These tutorials are only intended for demonstration purposes, to show how SB3 can be adapted to work in multi-agent settings.
```


```{figure} https://raw.githubusercontent.com/DLR-RM/stable-baselines3/master/docs/_static/img/logo.png
:alt: SB3 Logo
:width: 80%
```

```{toctree}
:hidden:
:caption: RLlib
:caption: SB3

pistonball
holdem
kaz
waterworld
connect_four
```
41 changes: 41 additions & 0 deletions docs/tutorials/sb3/kaz.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: "SB3: PPO for Knights-Archers-Zombies"
---

# SB3: PPO for Knights-Archers-Zombies

This tutorial shows how to train a [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) model on the [Knights-Archers-Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment ([AEC](https://pettingzoo.farama.org/api/aec/)).

It converts the environment into a Parallel environment and uses SuperSuit to create vectorized environments, leveraging multithreading to speed up training.

After training and evaluation, this script will launch a demo game using human rendering. Trained models are saved and loaded from disk (see SB3's [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/save_format.html) for more information).

```{eval-rst}
.. note::

This environment has a visual (3-dimensional) observation space, so we use a CNN feature extractor.
```

```{eval-rst}
.. warning::

Because this environment allows agents to spawn and die, it requires using SuperSuit's Black Death wrapper, which provides blank observations to dead agents, rather than removing them from the environment.
```


## Environment Setup
To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
```{eval-rst}
.. literalinclude:: ../../../tutorials/SB3/kaz/requirements.txt
:language: text
```

## Code
The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with SB3. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).

### Training and Evaluation

```{eval-rst}
.. literalinclude:: ../../../tutorials/SB3/kaz/sb3_kaz_vector.py
:language: python
```
34 changes: 0 additions & 34 deletions docs/tutorials/sb3/pistonball.md

This file was deleted.

34 changes: 0 additions & 34 deletions docs/tutorials/sb3/rps.md

This file was deleted.

33 changes: 33 additions & 0 deletions docs/tutorials/sb3/waterworld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: "SB3: PPO for Multiwalker (Parallel)"
---

# SB3: PPO for Multiwalker

This tutorial shows how to train a [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) model on the [Multiwalker](https://pettingzoo.farama.org/environments/sisl/multiwalker/) environment ([Parallel](https://pettingzoo.farama.org/api/parallel/)).

After training and evaluation, this script will launch a demo game using human rendering. Trained models are saved and loaded from disk (see SB3's [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/save_format.html) for more information).

```{eval-rst}
.. note::

This environment has a discrete (1-dimensional) observation space, so we use an MLP feature extractor.
```


## Environment Setup
To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
```{eval-rst}
.. literalinclude:: ../../../tutorials/SB3/waterworld/requirements.txt
:language: text
```

## Code
The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with SB3. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).

### Training and Evaluation

```{eval-rst}
.. literalinclude:: ../../../tutorials/SB3/multiwalker/sb3_waterworld_vector.py
:language: python
```
3 changes: 3 additions & 0 deletions pettingzoo/classic/chess/chess.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,9 @@ def step(self, action):
current_agent = self.agent_selection
current_index = self.agents.index(current_agent)

# Cast action into int
action = int(action)

chosen_move = chess_utils.action_to_move(self.board, action, current_index)
assert chosen_move in self.board.legal_moves
self.board.push(chosen_move)
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ classic = [
]
butterfly = ["pygame==2.3.0", "pymunk==6.2.0"]
mpe = ["pygame==2.3.0"]
sisl = ["pygame==2.3.0", "box2d-py==2.3.5", "scipy>=1.4.1"]
sisl = ["pygame==2.3.0", "pymunk==6.2.0", "box2d-py==2.3.5", "scipy>=1.4.1"]
other = ["pillow>=8.0.1"]
testing = [
"pynput",
Expand Down
2 changes: 1 addition & 1 deletion tutorials/Ray/render_rllib_leduc_holdem.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Uses Ray's RLLib to view trained agents playing Leduoc Holdem.
"""Uses Ray's RLlib to view trained agents playing Leduoc Holdem.

Author: Rohan (https://github.com/Rohan138)
"""
Expand Down
2 changes: 1 addition & 1 deletion tutorials/Ray/render_rllib_pistonball.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Uses Ray's RLLib to view trained agents playing Pistonball.
"""Uses Ray's RLlib to view trained agents playing Pistonball.

Author: Rohan (https://github.com/Rohan138)
"""
Expand Down
2 changes: 1 addition & 1 deletion tutorials/Ray/rllib_leduc_holdem.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Uses Ray's RLLib to train agents to play Leduc Holdem.
"""Uses Ray's RLlib to train agents to play Leduc Holdem.

Author: Rohan (https://github.com/Rohan138)
"""
Expand Down
2 changes: 1 addition & 1 deletion tutorials/Ray/rllib_pistonball.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Uses Ray's RLLib to train agents to play Pistonball.
"""Uses Ray's RLlib to train agents to play Pistonball.

Author: Rohan (https://github.com/Rohan138)
"""
Expand Down
3 changes: 3 additions & 0 deletions tutorials/SB3/connect_four/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
pettingzoo[classic]>=1.23.1
stable-baselines3>=2.0.0
sb3-contrib>=2.0.0
Loading