Skip to content

Commit

Permalink
Add action masking documentation, update usage scripts (#953)
Browse files Browse the repository at this point in the history
  • Loading branch information
elliottower authored May 6, 2023
1 parent 99f050c commit 9e8977d
Show file tree
Hide file tree
Showing 7 changed files with 92 additions and 44 deletions.
54 changes: 41 additions & 13 deletions docs/api/aec.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,46 +3,74 @@ title: AEC
---



# AEC API

By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows it to support any type of game multi-agent RL can consider.

## Example Usage
## Usage

AEC environments can be interacted with as follows:

``` python
```python
from pettingzoo.classic import rps_v2

env = rps_v2.env(render_mode="human")
env.reset(seed=42)

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample() # this is where you would insert your policy
env.step(action)
else:
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action)
env.close()
```

Note: for environments with illegal actions in the action space, actions can be sampled according to an action mask as follows:
``` python
### Action Masking
AEC environments often include action masks, in order to mark valid/invalid actions for the agent.

To sample actions using action masking:
```python
from pettingzoo.classic import chess_v5

env = chess_v5.env(render_mode="human")
env.reset(seed=42)

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample(observation["action_mask"]) # this is where you would insert your policy
env.step(action)
else:
# invalid action masking is optional and environment-dependent
if "action_mask" in info:
mask = info["action_mask"]
elif isinstance(observation, dict) and "action_mask" in observation:
mask = observation["action_mask"]
else:
mask = None
action = env.action_space(agent).sample(mask) # this is where you would insert your policy

env.step(action)
env.close()

```

Note: action masking is optional, and can be implemented using either `observation` or `info`.

* [PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) environments store action masks in the `observation` dict:
* `mask = observation["action_mask"]`
* [Shimmy](https://shimmy.farama.org/)'s [OpenSpiel environments](https://shimmy.farama.org/environments/open_spiel/) stores action masks in the `info` dict:
* `mask = info["action_mask"]`

To implement action masking in a custom environment, see [Environment Creation: Action Masking](https://pettingzoo.farama.org/tutorials/environmentcreation/3-action-masking/)

For more information on action masking, see [A Closer Look at Invalid Action Masking in Policy Gradient Algorithms](https://arxiv.org/abs/2006.14171) (Huang, 2022)


## AECEnv

```{eval-rst}
Expand Down
11 changes: 7 additions & 4 deletions docs/api/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,18 +7,21 @@ title: Parallel

In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.

## Example Usage
## Usage

Parallel environments can be interacted with as follows:

``` python
from pettingzoo.butterfly import pistonball_v6
parallel_env = pistonball_v6.parallel_env()
observations = parallel_env.reset()
parallel_env = pistonball_v6.parallel_env(render_mode="human")
observations = parallel_env.reset(seed=42)

while env.agents:
actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents} # this is where you would insert your policy
# this is where you would insert your policy
actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents}

observations, rewards, terminations, truncations, infos = parallel_env.step(actions)
env.close()
```

## ParallelEnv
Expand Down
13 changes: 8 additions & 5 deletions docs/environments/atari.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,19 +52,22 @@ Install ROMs using [AutoROM](https://github.com/Farama-Foundation/AutoROM), or s

### Usage

To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with agents taking random actions:
``` python
To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with random agents:
```python
from pettingzoo.atari import space_invaders_v2

env = space_invaders_v2.env(render_mode="human")
env.reset(seed=42)

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
env.action_space(agent).sample() # this is where you would insert your policy
env.step(action)
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action)
env.close()
```

Expand Down
24 changes: 15 additions & 9 deletions docs/environments/butterfly.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,38 +34,44 @@ pip install pettingzoo[butterfly]

### Usage

To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with agents taking random actions:
``` python
To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with random agents:
```python
from pettingzoo.butterfly import pistonball_v6
env = pistonball_v6.parallel_env(render_mode="human")

env = pistonball_v6.parallel_env(render_mode="human")
observations = env.reset()

while env.agents:
actions = {agent: env.action_space(agent).sample() for agent in env.agents} # this is where you would insert your policy
# this is where you would insert your policy
actions = {agent: env.action_space(agent).sample() for agent in env.agents}

observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()
```

To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py), controls are WASD and space):
``` python
To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py)):
```python
import pygame
from pettingzoo.butterfly import knights_archers_zombies_v10

env = knights_archers_zombies_v10.env(render_mode="human")
env.reset()
env.reset(seed=42)

clock = pygame.time.Clock()
manual_policy = knights_archers_zombies_v10.ManualPolicy(env)

for agent in env.agent_iter():
clock.tick(env.metadata["render_fps"])

observation, reward, termination, truncation, info = env.last()

if agent == manual_policy.agent:
# get user input (controls are WASD and space)
action = manual_policy(observation, agent)
else:
# this is where you would insert your policy (for non-player agents)
action = env.action_space(agent).sample()

env.step(action)
env.step(action)
env.close()
```

16 changes: 10 additions & 6 deletions docs/environments/classic.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,19 +36,23 @@ pip install pettingzoo[classic]

### Usage

To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with agents taking random actions:
To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with random agents:
``` python
from pettingzoo.classic import texas_holdem_v4

env = texas_holdem_v4.env(render_mode="human")
env.reset(seed=42)

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample(observation["action_mask"]) # this is where you would insert your policy
env.step(action)
break

mask = observation["action_mask"]
action = env.action_space(agent).sample(mask) # this is where you would insert your policy

env.step(action)
env.close()
```

Expand Down
10 changes: 6 additions & 4 deletions docs/environments/mpe.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,22 @@ pip install pettingzoo[mpe]
````

### Usage
To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with agents taking random actions:
To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with random agents:

``` python
```python
from pettingzoo.mpe import simple_tag_v2
env = simple_tag_v2.env(render_mode='human')

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample()
env.step(action)
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action)
env.close()
```

Expand Down
8 changes: 5 additions & 3 deletions docs/environments/sisl.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pip install pettingzoo[sisl]
````

### Usage
To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with agents taking random actions:
To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with random agents:

```python
from pettingzoo.sisl import waterworld_v4
Expand All @@ -36,11 +36,13 @@ env = waterworld_v4.env(render_mode='human')
env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample()
env.step(action)
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action)
env.close()
```

Expand Down

0 comments on commit 9e8977d

Please sign in to comment.