Farama-Foundation · elliottower · Jul 11, 2023 · Jul 7, 2023 · Jul 7, 2023 · Jul 7, 2023
diff --git a/.github/workflows/linux-tutorials-test.yml b/.github/workflows/linux-tutorials-test.yml
@@ -19,7 +19,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: ['3.7', '3.8', '3.9', '3.10']  # '3.11' - broken due to numba
-        tutorial: ['Tianshou', 'EnvironmentCreation', 'CleanRL']  # TODO: add back 'CleanRL' after SuperSuit is fixed
+        tutorial: ['Tianshou', 'EnvironmentCreation', 'CleanRL', 'SB3/kaz', 'SB3/waterworld', 'SB3/connect_four', 'SB3/test'] # TODO: add back RLlib once it is fixed
     steps:
       - uses: actions/checkout@v3
       - name: Set up Python ${{ matrix.python-version }}
@@ -29,8 +29,9 @@ jobs:
       - name: Install dependencies and run tutorials
         run: |
           sudo apt-get install python3-opengl xvfb
+          export root_dir=$(pwd)
           cd tutorials/${{ matrix.tutorial }}
           pip install -r requirements.txt
           pip uninstall -y pettingzoo
-          pip install -e ../..
+          pip install -e $root_dir
           for f in *.py; do xvfb-run -a -s "-screen 0 1024x768x24" python "$f"; done
diff --git a/docs/api/parallel.md b/docs/api/parallel.md
@@ -5,7 +5,7 @@ title: Parallel
 
 # Parallel API
 
-In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.
+In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLlib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.
 
 All parallel environments can be converted into AEC environments by splitting a simultaneous turn into sequential turns, with observations only from the previous cycle.
 

diff --git a/docs/index.md b/docs/index.md
@@ -44,6 +44,7 @@ tutorials/cleanrl/index
 tutorials/tianshou/index
 tutorials/rllib/index
 tutorials/langchain/index
+tutorials/sb3/index
 ```
 
 ```{toctree}

diff --git a/docs/tutorials/rllib/holdem.md b/docs/tutorials/rllib/holdem.md
@@ -16,7 +16,7 @@ To follow this tutorial, you will need to install the dependencies shown below.
 ```
 
 ## Code
-The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLLib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
+The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLlib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
 
 ### Training the RL agent
 

diff --git a/docs/tutorials/rllib/index.md b/docs/tutorials/rllib/index.md
@@ -2,9 +2,9 @@
 title: "RLlib"
 ---
 
-# RLlib Tutorial
+# Ray RLlib Tutorial
 
-These tutorials show you how to use [RLlib](https://docs.ray.io/en/latest/rllib/index.html) to train agents in PettingZoo environments.
+These tutorials show you how to use [Ray](https://docs.ray.io/en/latest/index.html)'s [RLlib](https://docs.ray.io/en/latest/rllib/index.html) library to train agents in PettingZoo environments.
 
 * [PPO for Pistonball](/tutorials/rllib/pistonball/): _Train a PPO model in a parallel environment_
 

diff --git a/docs/tutorials/rllib/pistonball.md b/docs/tutorials/rllib/pistonball.md
@@ -17,7 +17,7 @@ To follow this tutorial, you will need to install the dependencies shown below.
 ```
 
 ## Code
-The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLLib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
+The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLlib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
 
 ### Training the RL agent
 

diff --git a/docs/tutorials/sb3/connect_four.md b/docs/tutorials/sb3/connect_four.md
@@ -0,0 +1,54 @@
+---
+title: "SB3: Action Masked PPO for Connect Four"
+---
+
+# SB3: Action Masked PPO for Connect Four
+
+This tutorial shows how to train a Maskable [Proximal Policy Optimization](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html) (PPO) model on the [Connect Four](https://pettingzoo.farama.org/environments/classic/chess/) environment ([AEC](https://pettingzoo.farama.org/api/aec/)).
+
+It creates a custom Wrapper to convert to a [Gymnasium](https://gymnasium.farama.org/)-like environment which is compatible with [SB3 action masking](https://sb3-contrib.readthedocs.io/en/master/modules/ppo_mask.html).
+
+
+```{eval-rst}
+.. note::
+
+    This environment has a discrete (1-dimensional) observation space with an illegal action mask, so we use a masked MLP feature extractor.
+```
+
+```{eval-rst}
+.. warning::
+
+    This wrapper assumes that the action space and observation space is the same for each agent, this assumption may not hold for custom environments.
+```
+
+After training and evaluation, this script will launch a demo game using human rendering. Trained models are saved and loaded from disk (see SB3's [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/save_format.html) for more information).
+
+
+## Environment Setup
+To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/SB3/connect_four/requirements.txt
+   :language: text
+```
+
+## Code
+The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with SB3. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
+
+### Training and Evaluation
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/SB3/connect_four/sb3_connect_four_action_mask.py
+   :language: python
+```
+
+### Testing other PettingZoo Classic environments
+
+The following script uses [pytest](https://docs.pytest.org/en/latest/) to test all other PettingZoo environments which support action masking.
+
+This code yields good results on simpler environments like [Gin Rummy](/environments/classic/gin_rummy/) and [Texas Hold’em No Limit](/environments/classic/texas_holdem_no_limit/), while failing to perform better than random in more difficult environments such as [Chess](/environments/classic/chess/) or [Hanabi](/environments/classic/hanabi/).
+
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/SB3/test_sb3_action_mask.py
+   :language: python
+```
diff --git a/docs/tutorials/sb3/index.md b/docs/tutorials/sb3/index.md
@@ -2,24 +2,45 @@
 title: "Stable-Baselines3"
 ---
 
-# SB3 Tutorial
+# Stable-Baselines3 Tutorial
 
-These tutorials show you how to use [SB3](https://stable-baselines3.readthedocs.io/en/master/) to train agents in PettingZoo environments.
+These tutorials show you how to use the [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) (SB3) library to train agents in PettingZoo environments.
 
-* [PPO for Pistonball](/tutorials/sb3/pistonball/): _Train a PPO model in a parallel environment_
+For environments with visual observations, we use a [CNN](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#stable_baselines3.ppo.CnnPolicy) policy and perform pre-processing steps such as frame-stacking, color reduction, and resizing using [SuperSuit](/api/wrappers/supersuit_wrappers/).
 
-* [PPO for Rock-Paper-Scissors](/tutorials/sb3/rps/) _Train a PPO model in an AEC environment_
+* [PPO for Knights-Archers-Zombies](/tutorials/sb3/kaz/) _Train agents using PPO in a vectorized environment with visual observations_
 
+For non-visual environments, we use [MLP](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html#stable_baselines3.ppo.MlpPolicy) policies and do not perform any pre-processing steps.
 
-```{figure} https://docs.ray.io/en/latest/_images/rllib-stack.svg
-    :alt: RLlib stack
+* [PPO for Waterworld](/tutorials/sb3/waterworld/): _Train agents using PPO in a vectorized environment with discrete observations_
+
+* [Action Masked PPO for Connect Four](/tutorials/sb3/connect_four/): _Train agents using Action Masked PPO in an AEC environment_
+
+
+## Stable-Baselines Overview
+
+[Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) (SB3) is a library providing reliable implementations of reinforcement learning algorithms in [PyTorch](https://pytorch.org/). It provides a clean and simple interface, giving you access to off-the-shelf state-of-the-art model-free RL algorithms. It allows training of RL agents with only a few lines of code.
+
+For more information, see the [Stable-Baselines3 v1.0 Blog Post](https://araffin.github.io/post/sb3/)
+
+
+```{eval-rst}
+.. warning::
+
+    Note: SB3 is designed for single-agent RL and does not plan on natively supporting multi-agent PettingZoo environments. These tutorials are only intended for demonstration purposes, to show how SB3 can be adapted to work in multi-agent settings.
+```
+
+
+```{figure} https://raw.githubusercontent.com/DLR-RM/stable-baselines3/master/docs/_static/img/logo.png
+    :alt: SB3 Logo
     :width: 80%
 ```
 
 ```{toctree}
 :hidden:
-:caption: RLlib
+:caption: SB3
 
-pistonball
-holdem
+kaz
+waterworld
+connect_four
 ```
diff --git a/docs/tutorials/sb3/kaz.md b/docs/tutorials/sb3/kaz.md
@@ -0,0 +1,41 @@
+---
+title: "SB3: PPO for Knights-Archers-Zombies"
+---
+
+# SB3: PPO for Knights-Archers-Zombies
+
+This tutorial shows how to train a [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) model on the [Knights-Archers-Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment ([AEC](https://pettingzoo.farama.org/api/aec/)).
+
+It converts the environment into a Parallel environment and uses SuperSuit to create vectorized environments, leveraging multithreading to speed up training.
+
+After training and evaluation, this script will launch a demo game using human rendering. Trained models are saved and loaded from disk (see SB3's [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/save_format.html) for more information).
+
+```{eval-rst}
+.. note::
+
+    This environment has a visual (3-dimensional) observation space, so we use a CNN feature extractor.
+```
+
+```{eval-rst}
+.. warning::
+
+    Because this environment allows agents to spawn and die, it requires using SuperSuit's Black Death wrapper, which provides blank observations to dead agents, rather than removing them from the environment.
+```
+
+
+## Environment Setup
+To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/SB3/kaz/requirements.txt
+   :language: text
+```
+
+## Code
+The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with SB3. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
+
+### Training and Evaluation
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/SB3/kaz/sb3_kaz_vector.py
+   :language: python
+```
diff --git a/docs/tutorials/sb3/pistonball.md b/docs/tutorials/sb3/pistonball.md
diff --git a/docs/tutorials/sb3/rps.md b/docs/tutorials/sb3/rps.md
diff --git a/docs/tutorials/sb3/waterworld.md b/docs/tutorials/sb3/waterworld.md
@@ -0,0 +1,33 @@
+---
+title: "SB3: PPO for Multiwalker (Parallel)"
+---
+
+# SB3: PPO for Multiwalker
+
+This tutorial shows how to train a [Proximal Policy Optimization](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html) (PPO) model on the [Multiwalker](https://pettingzoo.farama.org/environments/sisl/multiwalker/) environment ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
+
+After training and evaluation, this script will launch a demo game using human rendering. Trained models are saved and loaded from disk (see SB3's [documentation](https://stable-baselines3.readthedocs.io/en/master/guide/save_format.html) for more information).
+
+```{eval-rst}
+.. note::
+
+    This environment has a discrete (1-dimensional) observation space, so we use an MLP feature extractor.
+```
+
+
+## Environment Setup
+To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/SB3/waterworld/requirements.txt
+   :language: text
+```
+
+## Code
+The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with SB3. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
+
+### Training and Evaluation
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/SB3/multiwalker/sb3_waterworld_vector.py
+   :language: python
+```
diff --git a/pettingzoo/classic/chess/chess.py b/pettingzoo/classic/chess/chess.py
@@ -268,6 +268,9 @@ def step(self, action):
         current_agent = self.agent_selection
         current_index = self.agents.index(current_agent)
 
+        # Cast action into int
+        action = int(action)
+
         chosen_move = chess_utils.action_to_move(self.board, action, current_index)
         assert chosen_move in self.board.legal_moves
         self.board.push(chosen_move)

diff --git a/pyproject.toml b/pyproject.toml
@@ -41,7 +41,7 @@ classic = [
 ]
 butterfly = ["pygame==2.3.0", "pymunk==6.2.0"]
 mpe = ["pygame==2.3.0"]
-sisl = ["pygame==2.3.0", "box2d-py==2.3.5", "scipy>=1.4.1"]
+sisl = ["pygame==2.3.0", "pymunk==6.2.0", "box2d-py==2.3.5", "scipy>=1.4.1"]
 other = ["pillow>=8.0.1"]
 testing = [
     "pynput",

diff --git a/tutorials/Ray/render_rllib_leduc_holdem.py b/tutorials/Ray/render_rllib_leduc_holdem.py
@@ -1,4 +1,4 @@
-"""Uses Ray's RLLib to view trained agents playing Leduoc Holdem.
+"""Uses Ray's RLlib to view trained agents playing Leduoc Holdem.
 
 Author: Rohan (https://github.com/Rohan138)
 """

diff --git a/tutorials/Ray/render_rllib_pistonball.py b/tutorials/Ray/render_rllib_pistonball.py
@@ -1,4 +1,4 @@
-"""Uses Ray's RLLib to view trained agents playing Pistonball.
+"""Uses Ray's RLlib to view trained agents playing Pistonball.
 
 Author: Rohan (https://github.com/Rohan138)
 """

diff --git a/tutorials/Ray/rllib_leduc_holdem.py b/tutorials/Ray/rllib_leduc_holdem.py
@@ -1,4 +1,4 @@
-"""Uses Ray's RLLib to train agents to play Leduc Holdem.
+"""Uses Ray's RLlib to train agents to play Leduc Holdem.
 
 Author: Rohan (https://github.com/Rohan138)
 """

diff --git a/tutorials/Ray/rllib_pistonball.py b/tutorials/Ray/rllib_pistonball.py
@@ -1,4 +1,4 @@
-"""Uses Ray's RLLib to train agents to play Pistonball.
+"""Uses Ray's RLlib to train agents to play Pistonball.
 
 Author: Rohan (https://github.com/Rohan138)
 """

diff --git a/tutorials/SB3/connect_four/requirements.txt b/tutorials/SB3/connect_four/requirements.txt
@@ -0,0 +1,3 @@
+pettingzoo[classic]>=1.23.1
+stable-baselines3>=2.0.0
+sb3-contrib>=2.0.0
-Original file line number
+Diff line change
@@ Expand Up @@
     ```
     ## Code
-    The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLLib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
+    The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with RLlib. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX).
     ### Training the RL agent
@@ Expand Down @@