Skip to content

Commit

Permalink
Overhaul env creation guide (#838)
Browse files Browse the repository at this point in the history
  • Loading branch information
WillDudley authored Nov 7, 2022
1 parent c082f1a commit bb7ae0b
Show file tree
Hide file tree
Showing 15 changed files with 411 additions and 2 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/linux-tutorials-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
fail-fast: false
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
tutorial: ['CleanRL', 'Tianshou']
tutorial: ['CleanRL', 'Tianshou', 'EnvironmentCreation']
steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
Expand Down
4 changes: 4 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,10 @@ tutorials/cleanrl/implementing_PPO
tutorials/tianshou/beginner
tutorials/tianshou/intermediate
tutorials/tianshou/advanced
tutorials/environmentcreation/1-project-structure
tutorials/environmentcreation/2-environment-logic
tutorials/environmentcreation/3-action-masking
tutorials/environmentcreation/4-testing-your-environment
```

```{toctree}
Expand Down
47 changes: 47 additions & 0 deletions docs/tutorials/environmentcreation/1-project-structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "(WIP) Creating Environments: Repository Structure"
---

# (WIP) Creating Environments: Repository Structure

## Introduction

Welcome to the first of five short tutorials, guiding you through the process of creating your own PettingZoo environment, from conception to deployment.

We will be creating a parallel environment, meaning that each agent acts simultaneously.

Before thinking about the environment logic, we should understand the structure of environment repositories.

## Tree structure
Environment repositories are usually laid out using the following structure:

Custom-Environment
├── custom-environment
└── env
└── custom_environment.py
└── custom_environment_v0.py
├── README.md
└── requirements.txt

- `/custom-environment/env` is where your environment will be stored, along with any helper functions (in the case of a complicated environment).
- `/custom-environment/custom_environment_v0.py` is a file that imports the environment - we use the file name for environment version control.
- `/README.md` is a file used to describe your environment.
- `/requirements.txt` is a file used to keep track of your environment dependencies. At the very least, `pettingzoo` should be in there. **Please version control all your dependencies via `==`**.

### Advanced: Additional (optional) files
The above file structure is minimal. A more deployment-ready environment would include
- `/docs/` for documentation,
- `/setup.py` for packaging,
- `/custom-environment/__init__.py` for depreciation handling, and
- Github actions for continuous integration of environment tests.

Implementing these are outside the scope of this tutorial.

## Skeleton code
The entirety of your environment logic is stored within `/custom-environment/env`

```{eval-rst}
.. literalinclude:: ../../../tutorials/EnvironmentCreation/1-SkeletonCreation.py
:language: python
:caption: /custom-environment/env/custom_environment.py
```
23 changes: 23 additions & 0 deletions docs/tutorials/environmentcreation/2-environment-logic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: "(WIP) Creating Environments: Environment Logic"
---

# (WIP) Creating Environments: Environment Logic

## Introduction

Now that we have a basic understanding of the structure of environment repositories, we can start thinking about the fun part - environment logic!

For this tutorial, we will be creating a two-player game consisting of a prisoner, trying to escape, and a guard, trying to catch the prisoner. This game will be played on a 7x7 grid, where:
- The prisoner starts in the top left corner,
- the guard starts in the bottom right corner,
- the escape door is randomly placed in the middle of the grid, and
- Both the prisoner and the guard can move in any of the four cardinal directions (up, down, left, right).

## Code

```{eval-rst}
.. literalinclude:: ../../../tutorials/EnvironmentCreation/2-AddingGameLogic.py
:language: python
:caption: /custom-environment/env/custom_environment.py
```
20 changes: 20 additions & 0 deletions docs/tutorials/environmentcreation/3-action-masking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: "(WIP) Creating Environments: Action Masking"
---

# (WIP) Creating Environments: Action Masking

## Introduction

In many environments, it is natural for some actions to be invalid at certain times. For example, in a game of chess, it is impossible to move a pawn forward if it is already at the front of the board. In PettingZoo, we can use action masking to prevent invalid actions from being taken.

Action masking is a more natural way of handling invalid actions than having an action have no effect, which was how we handled bumping into walls in the previous tutorial.

## Code

```{eval-rst}
.. literalinclude:: ../../../tutorials/EnvironmentCreation/3-ActionMasking.py
:language: python
:caption: /custom-environment/env/custom_environment.py
:lines: -147
```
19 changes: 19 additions & 0 deletions docs/tutorials/environmentcreation/4-testing-your-environment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: "(WIP) Creating Environments: Testing Your Environment"
---

# (WIP) Creating Environments: Testing Your Environment

## Introduction

Now that our environment is complete, we can test it to make sure it works as intended. PettingZoo has a built-in testing suite that can be used to test your environment.

## Code
(add this code below the rest of the code in the file)

```{eval-rst}
.. literalinclude:: ../../../tutorials/EnvironmentCreation/3-ActionMasking.py
:language: python
:caption: /custom-environment/env/custom_environment.py
:lines: 148-
```
Empty file.
4 changes: 3 additions & 1 deletion pettingzoo/test/parallel_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,9 @@ def parallel_api_test(par_env, num_cycles=1000):
if d:
live_agents.remove(agent)

assert set(par_env.agents) == live_agents
assert (
set(par_env.agents) == live_agents
), f"{par_env.agents} != {live_agents}"

if len(live_agents) == 0:
break
21 changes: 21 additions & 0 deletions tutorials/EnvironmentCreation/1-SkeletonCreation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
from pettingzoo.utils.env import ParallelEnv


class CustomEnvironment(ParallelEnv):
def __init__(self):
pass

def reset(self, seed=None, return_info=False, options=None):
pass

def step(self, actions):
pass

def render(self):
pass

def observation_space(self, agent):
return self.observation_spaces[agent]

def action_space(self, agent):
return self.action_spaces[agent]
114 changes: 114 additions & 0 deletions tutorials/EnvironmentCreation/2-AddingGameLogic.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
import functools
import random
from copy import copy

import numpy as np
from gymnasium.spaces import Discrete, MultiDiscrete

from pettingzoo.utils.env import ParallelEnv


class CustomEnvironment(ParallelEnv):
def __init__(self):
self.escape_y = None
self.escape_x = None
self.guard_y = None
self.guard_x = None
self.prisoner_y = None
self.prisoner_x = None
self.timestep = None
self.possible_agents = ["prisoner", "guard"]

def reset(self, seed=None, return_info=False, options=None):
self.agents = copy(self.possible_agents)
self.timestep = 0

self.prisoner_x = 0
self.prisoner_y = 0

self.guard_x = 7
self.guard_y = 7

self.escape_x = random.randint(2, 5)
self.escape_y = random.randint(2, 5)

observations = {
a: (
self.prisoner_x + 7 * self.prisoner_y,
self.guard_x + 7 * self.guard_y,
self.escape_x + 7 * self.escape_y,
)
for a in self.agents
}
return observations

def step(self, actions):
# Execute actions
prisoner_action = actions["prisoner"]
guard_action = actions["guard"]

if prisoner_action == 0 and self.prisoner_x > 0:
self.prisoner_x -= 1
elif prisoner_action == 1 and self.prisoner_x < 6:
self.prisoner_x += 1
elif prisoner_action == 2 and self.prisoner_y > 0:
self.prisoner_y -= 1
elif prisoner_action == 3 and self.prisoner_y < 6:
self.prisoner_y += 1

if guard_action == 0 and self.guard_x > 0:
self.guard_x -= 1
elif guard_action == 1 and self.guard_x < 6:
self.guard_x += 1
elif guard_action == 2 and self.guard_y > 0:
self.guard_y -= 1
elif guard_action == 3 and self.guard_y < 6:
self.guard_y += 1

# Check termination conditions
terminations = {a: False for a in self.agents}
rewards = {a: 0 for a in self.agents}
if self.prisoner_x == self.guard_x and self.prisoner_y == self.guard_y:
rewards = {"prisoner": -1, "guard": 1}
terminations = {a: True for a in self.agents}

elif self.prisoner_x == self.escape_x and self.prisoner_y == self.escape_y:
rewards = {"prisoner": 1, "guard": -1}
terminations = {a: True for a in self.agents}

# Check truncation conditions (overwrites termination conditions)
truncations = {a: False for a in self.agents}
if self.timestep > 100:
rewards = {"prisoner": 0, "guard": 0}
truncations = {"prisoner": True, "guard": True}
self.timestep += 1

# Get observations
observations = {
a: (
self.prisoner_x + 7 * self.prisoner_y,
self.guard_x + 7 * self.guard_y,
self.escape_x + 7 * self.escape_y,
)
for a in self.agents
}

# Get dummy infos (not used in this example)
infos = {a: {} for a in self.agents}

return observations, rewards, terminations, truncations, infos

def render(self):
grid = np.zeros((7, 7))
grid[self.prisoner_y, self.prisoner_x] = "P"
grid[self.guard_y, self.guard_x] = "G"
grid[self.escape_y, self.escape_x] = "E"
print(f"{grid} \n")

@functools.lru_cache(maxsize=None)
def observation_space(self, agent):
return MultiDiscrete([7 * 7 - 1] * 3)

@functools.lru_cache(maxsize=None)
def action_space(self, agent):
return Discrete(4)
Loading

0 comments on commit bb7ae0b

Please sign in to comment.