Overhaul env creation guide (#838)

Farama-Foundation · Nov 7, 2022 · bb7ae0b · bb7ae0b
1 parent c082f1a
commit bb7ae0b
Show file tree

Hide file tree

Showing 15 changed files with 411 additions and 2 deletions.
diff --git a/.github/workflows/linux-tutorials-test.yml b/.github/workflows/linux-tutorials-test.yml
@@ -19,7 +19,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: ['3.7', '3.8', '3.9', '3.10']
-        tutorial: ['CleanRL', 'Tianshou']
+        tutorial: ['CleanRL', 'Tianshou', 'EnvironmentCreation']
     steps:
       - uses: actions/checkout@v3
       - name: Set up Python ${{ matrix.python-version }}

diff --git a/docs/index.md b/docs/index.md
@@ -44,6 +44,10 @@ tutorials/cleanrl/implementing_PPO
 tutorials/tianshou/beginner
 tutorials/tianshou/intermediate
 tutorials/tianshou/advanced
+tutorials/environmentcreation/1-project-structure
+tutorials/environmentcreation/2-environment-logic
+tutorials/environmentcreation/3-action-masking
+tutorials/environmentcreation/4-testing-your-environment
 ```
 
 ```{toctree}

diff --git a/docs/tutorials/environmentcreation/1-project-structure.md b/docs/tutorials/environmentcreation/1-project-structure.md
@@ -0,0 +1,47 @@
+---
+title: "(WIP) Creating Environments: Repository Structure"
+---
+
+# (WIP) Creating Environments: Repository Structure
+
+## Introduction
+
+Welcome to the first of five short tutorials, guiding you through the process of creating your own PettingZoo environment, from conception to deployment.
+
+We will be creating a parallel environment, meaning that each agent acts simultaneously.
+
+Before thinking about the environment logic, we should understand the structure of environment repositories.
+
+## Tree structure
+Environment repositories are usually laid out using the following structure:
+
+    Custom-Environment
+    ├── custom-environment
+        └── env
+            └── custom_environment.py
+        └── custom_environment_v0.py
+    ├── README.md
+    └── requirements.txt
+
+- `/custom-environment/env` is where your environment will be stored, along with any helper functions (in the case of a complicated environment).
+- `/custom-environment/custom_environment_v0.py` is a file that imports the environment - we use the file name for environment version control.
+- `/README.md` is a file used to describe your environment.
+- `/requirements.txt` is a file used to keep track of your environment dependencies. At the very least, `pettingzoo` should be in there. **Please version control all your dependencies via `==`**.
+
+### Advanced: Additional (optional) files
+The above file structure is minimal. A more deployment-ready environment would include 
+- `/docs/` for documentation, 
+- `/setup.py` for packaging, 
+- `/custom-environment/__init__.py` for depreciation handling, and 
+- Github actions for continuous integration of environment tests. 
+
+Implementing these are outside the scope of this tutorial.
+
+## Skeleton code
+The entirety of your environment logic is stored within `/custom-environment/env`
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/EnvironmentCreation/1-SkeletonCreation.py
+   :language: python
+   :caption: /custom-environment/env/custom_environment.py
+```
diff --git a/docs/tutorials/environmentcreation/2-environment-logic.md b/docs/tutorials/environmentcreation/2-environment-logic.md
@@ -0,0 +1,23 @@
+---
+title: "(WIP) Creating Environments: Environment Logic"
+---
+
+# (WIP) Creating Environments: Environment Logic
+
+## Introduction
+
+Now that we have a basic understanding of the structure of environment repositories, we can start thinking about the fun part - environment logic!
+
+For this tutorial, we will be creating a two-player game consisting of a prisoner, trying to escape, and a guard, trying to catch the prisoner. This game will be played on a 7x7 grid, where:
+- The prisoner starts in the top left corner,
+- the guard starts in the bottom right corner,
+- the escape door is randomly placed in the middle of the grid, and 
+- Both the prisoner and the guard can move in any of the four cardinal directions (up, down, left, right).
+
+## Code
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/EnvironmentCreation/2-AddingGameLogic.py
+   :language: python
+   :caption: /custom-environment/env/custom_environment.py
+```
diff --git a/docs/tutorials/environmentcreation/3-action-masking.md b/docs/tutorials/environmentcreation/3-action-masking.md
@@ -0,0 +1,20 @@
+---
+title: "(WIP) Creating Environments: Action Masking"
+---
+
+# (WIP) Creating Environments: Action Masking
+
+## Introduction
+
+In many environments, it is natural for some actions to be invalid at certain times. For example, in a game of chess, it is impossible to move a pawn forward if it is already at the front of the board. In PettingZoo, we can use action masking to prevent invalid actions from being taken.
+
+Action masking is a more natural way of handling invalid actions than having an action have no effect, which was how we handled bumping into walls in the previous tutorial.
+
+## Code
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/EnvironmentCreation/3-ActionMasking.py
+   :language: python
+   :caption: /custom-environment/env/custom_environment.py
+   :lines: -147
+```
diff --git a/docs/tutorials/environmentcreation/4-testing-your-environment.md b/docs/tutorials/environmentcreation/4-testing-your-environment.md
@@ -0,0 +1,19 @@
+---
+title: "(WIP) Creating Environments: Testing Your Environment"
+---
+
+# (WIP) Creating Environments: Testing Your Environment
+
+## Introduction
+
+Now that our environment is complete, we can test it to make sure it works as intended. PettingZoo has a built-in testing suite that can be used to test your environment.
+
+## Code
+(add this code below the rest of the code in the file)
+
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/EnvironmentCreation/3-ActionMasking.py
+   :language: python
+   :caption: /custom-environment/env/custom_environment.py
+   :lines: 148-
+```
diff --git a/docs/tutorials/environmentcreation/5-using-your-environment.md b/docs/tutorials/environmentcreation/5-using-your-environment.md
diff --git a/pettingzoo/test/parallel_test.py b/pettingzoo/test/parallel_test.py
@@ -115,7 +115,9 @@ def parallel_api_test(par_env, num_cycles=1000):
                 if d:
                     live_agents.remove(agent)
 
-            assert set(par_env.agents) == live_agents
+            assert (
+                set(par_env.agents) == live_agents
+            ), f"{par_env.agents} != {live_agents}"
 
             if len(live_agents) == 0:
                 break
diff --git a/tutorials/EnvironmentCreation/1-SkeletonCreation.py b/tutorials/EnvironmentCreation/1-SkeletonCreation.py
@@ -0,0 +1,21 @@
+from pettingzoo.utils.env import ParallelEnv
+
+
+class CustomEnvironment(ParallelEnv):
+    def __init__(self):
+        pass
+
+    def reset(self, seed=None, return_info=False, options=None):
+        pass
+
+    def step(self, actions):
+        pass
+
+    def render(self):
+        pass
+
+    def observation_space(self, agent):
+        return self.observation_spaces[agent]
+
+    def action_space(self, agent):
+        return self.action_spaces[agent]
diff --git a/tutorials/EnvironmentCreation/2-AddingGameLogic.py b/tutorials/EnvironmentCreation/2-AddingGameLogic.py
@@ -0,0 +1,114 @@
+import functools
+import random
+from copy import copy
+
+import numpy as np
+from gymnasium.spaces import Discrete, MultiDiscrete
+
+from pettingzoo.utils.env import ParallelEnv
+
+
+class CustomEnvironment(ParallelEnv):
+    def __init__(self):
+        self.escape_y = None
+        self.escape_x = None
+        self.guard_y = None
+        self.guard_x = None
+        self.prisoner_y = None
+        self.prisoner_x = None
+        self.timestep = None
+        self.possible_agents = ["prisoner", "guard"]
+
+    def reset(self, seed=None, return_info=False, options=None):
+        self.agents = copy(self.possible_agents)
+        self.timestep = 0
+
+        self.prisoner_x = 0
+        self.prisoner_y = 0
+
+        self.guard_x = 7
+        self.guard_y = 7
+
+        self.escape_x = random.randint(2, 5)
+        self.escape_y = random.randint(2, 5)
+
+        observations = {
+            a: (
+                self.prisoner_x + 7 * self.prisoner_y,
+                self.guard_x + 7 * self.guard_y,
+                self.escape_x + 7 * self.escape_y,
+            )
+            for a in self.agents
+        }
+        return observations
+
+    def step(self, actions):
+        # Execute actions
+        prisoner_action = actions["prisoner"]
+        guard_action = actions["guard"]
+
+        if prisoner_action == 0 and self.prisoner_x > 0:
+            self.prisoner_x -= 1
+        elif prisoner_action == 1 and self.prisoner_x < 6:
+            self.prisoner_x += 1
+        elif prisoner_action == 2 and self.prisoner_y > 0:
+            self.prisoner_y -= 1
+        elif prisoner_action == 3 and self.prisoner_y < 6:
+            self.prisoner_y += 1
+
+        if guard_action == 0 and self.guard_x > 0:
+            self.guard_x -= 1
+        elif guard_action == 1 and self.guard_x < 6:
+            self.guard_x += 1
+        elif guard_action == 2 and self.guard_y > 0:
+            self.guard_y -= 1
+        elif guard_action == 3 and self.guard_y < 6:
+            self.guard_y += 1
+
+        # Check termination conditions
+        terminations = {a: False for a in self.agents}
+        rewards = {a: 0 for a in self.agents}
+        if self.prisoner_x == self.guard_x and self.prisoner_y == self.guard_y:
+            rewards = {"prisoner": -1, "guard": 1}
+            terminations = {a: True for a in self.agents}
+
+        elif self.prisoner_x == self.escape_x and self.prisoner_y == self.escape_y:
+            rewards = {"prisoner": 1, "guard": -1}
+            terminations = {a: True for a in self.agents}
+
+        # Check truncation conditions (overwrites termination conditions)
+        truncations = {a: False for a in self.agents}
+        if self.timestep > 100:
+            rewards = {"prisoner": 0, "guard": 0}
+            truncations = {"prisoner": True, "guard": True}
+        self.timestep += 1
+
+        # Get observations
+        observations = {
+            a: (
+                self.prisoner_x + 7 * self.prisoner_y,
+                self.guard_x + 7 * self.guard_y,
+                self.escape_x + 7 * self.escape_y,
+            )
+            for a in self.agents
+        }
+
+        # Get dummy infos (not used in this example)
+        infos = {a: {} for a in self.agents}
+
+        return observations, rewards, terminations, truncations, infos
+
+    def render(self):
+        grid = np.zeros((7, 7))
+        grid[self.prisoner_y, self.prisoner_x] = "P"
+        grid[self.guard_y, self.guard_x] = "G"
+        grid[self.escape_y, self.escape_x] = "E"
+        print(f"{grid} \n")
+
+    @functools.lru_cache(maxsize=None)
+    def observation_space(self, agent):
+        return MultiDiscrete([7 * 7 - 1] * 3)
+
+    @functools.lru_cache(maxsize=None)
+    def action_space(self, agent):
+        return Discrete(4)