Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add action masking documentation, update usage scripts #953

55 changes: 49 additions & 6 deletions docs/api/aec.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,68 @@

By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows it to support any type of game multi-agent RL can consider.

## Example Usage
## Usage

AEC environments can be interacted with as follows:

``` python
```python
from pettingzoo.classic import rps_v2

env = rps_v2.env(render_mode="human")
env.reset(seed=42)

for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action) # execute the action in the environment
env.close()
```

### Action Masking
AEC environments often include action masks, in order to mark valid/invalid actions for the agent.

To sample actions using action masking:
```python
from pettingzoo.classic import chess_v5

env = chess_v5.env(render_mode="human")
env.reset(seed=42)

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample(observation["action_mask"]) # this is where you would insert your policy
env.step(action)
else:
# invalid action masking is optional and environment-dependent
if "action_mask" in info:
mask = info["action_mask"]
elif isinstance(observation, dict) and "action_mask" in observation:
mask = observation["action_mask"]
else:
mask = None
action = env.action_space(agent).sample(mask) # this is where you would insert your policy
env.step(action) # execute the action in the environment
env.close()
```

Note: action masking is optional, and can be implemented using either `observation` or `info`.

* [PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) environments store action masks in the `observation` dict:
* `mask = observation["action_mask"]`
* [Shimmy](https://shimmy.farama.org/)'s [OpenSpiel environments](https://shimmy.farama.org/environments/open_spiel/) stores action masks in the `info` dict:
* `mask = info["action_mask"]`

To implement action masking in a custom environment, see [Environment Creation: Action Masking](https://pettingzoo.farama.org/tutorials/environmentcreation/3-action-masking/)

For more information on action masking, see [A Closer Look at Invalid Action Masking in Policy Gradient Algorithms](https://arxiv.org/abs/2006.14171) (Huang, 2022)


## AECEnv

```{eval-rst}
Expand Down
11 changes: 7 additions & 4 deletions docs/api/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,20 @@

In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.

## Example Usage
## Usage

Parallel environments can be interacted with as follows:

``` python
from pettingzoo.butterfly import pistonball_v6
parallel_env = pistonball_v6.parallel_env()
observations = parallel_env.reset()
parallel_env = pistonball_v6.parallel_env(render_mode="human")
observations = parallel_env.reset(seed=42)

while env.agents:
actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents} # this is where you would insert your policy
# this is where you would insert your policy
actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents}

# execute the actions in the environment
observations, rewards, terminations, truncations, infos = parallel_env.step(actions)
```

Expand Down
13 changes: 8 additions & 5 deletions docs/environments/atari.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,19 +52,22 @@ Install ROMs using [AutoROM](https://github.com/Farama-Foundation/AutoROM), or s

### Usage

To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with agents taking random actions:
``` python
To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with random agents:
```python
from pettingzoo.atari import space_invaders_v2

env = space_invaders_v2.env(render_mode="human")
env.reset(seed=42)

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
env.action_space(agent).sample() # this is where you would insert your policy
env.step(action)
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action) # execute the action in the environment
env.close()
```

Expand Down
24 changes: 15 additions & 9 deletions docs/environments/butterfly.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,38 +34,44 @@ pip install pettingzoo[butterfly]

### Usage

To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with agents taking random actions:
``` python
To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with random agents:
```python
from pettingzoo.butterfly import pistonball_v6
env = pistonball_v6.parallel_env(render_mode="human")

env = pistonball_v6.parallel_env(render_mode="human")
observations = env.reset()

while env.agents:
actions = {agent: env.action_space(agent).sample() for agent in env.agents} # this is where you would insert your policy
# this is where you would insert your policy
actions = {agent: env.action_space(agent).sample() for agent in env.agents}

# execute the actions in the environment
observations, rewards, terminations, truncations, infos = env.step(actions)
env.close()
```

To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py), controls are WASD and space):
``` python
To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py)):
```python
import pygame
from pettingzoo.butterfly import knights_archers_zombies_v10

env = knights_archers_zombies_v10.env(render_mode="human")
env.reset()
env.reset(seed=42)

clock = pygame.time.Clock()
manual_policy = knights_archers_zombies_v10.ManualPolicy(env)

for agent in env.agent_iter():
clock.tick(env.metadata["render_fps"])

observation, reward, termination, truncation, info = env.last()

if agent == manual_policy.agent:
# get user input (controls are WASD and space)
action = manual_policy(observation, agent)
else:
# this is where you would insert your policy (for non-player agents)
action = env.action_space(agent).sample()

env.step(action)
env.step(action) # execute the action in the environment
```

16 changes: 10 additions & 6 deletions docs/environments/classic.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,19 +36,23 @@ pip install pettingzoo[classic]

### Usage

To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with agents taking random actions:
To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with random agents:
``` python
from pettingzoo.classic import texas_holdem_v4

env = texas_holdem_v4.env(render_mode="human")
env.reset(seed=42)

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample(observation["action_mask"]) # this is where you would insert your policy
env.step(action)
break

mask = observation["action_mask"]
action = env.action_space(agent).sample(mask) # this is where you would insert your policy

env.step(action) # execute the action in the environment
env.close()
```

Expand Down
10 changes: 6 additions & 4 deletions docs/environments/mpe.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,20 +34,22 @@ pip install pettingzoo[mpe]
````

### Usage
To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with agents taking random actions:
To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with random agents:

``` python
```python
from pettingzoo.mpe import simple_tag_v2
env = simple_tag_v2.env(render_mode='human')

env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample()
env.step(action)
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action) # execute the action in the environment
env.close()
```

Expand Down
8 changes: 5 additions & 3 deletions docs/environments/sisl.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ pip install pettingzoo[sisl]
````

### Usage
To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with agents taking random actions:
To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with random agents:

```python
from pettingzoo.sisl import waterworld_v4
Expand All @@ -36,11 +36,13 @@ env = waterworld_v4.env(render_mode='human')
env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()

if termination or truncation:
action = None
else:
action = env.action_space(agent).sample()
env.step(action)
action = env.action_space(agent).sample() # this is where you would insert your policy

env.step(action) # execute the action in the environment
env.close()
```

Expand Down
19 changes: 12 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,16 @@ Contribute to the Docs <https://github.com/Farama-Foundation/PettingZoo/tree/mas
Environments can be interacted with in a manner very similar to [Gymnasium](https://gymnasium.farama.org):

```python
from pettingzoo.butterfly import knights_archers_zombies_v10
env = knights_archers_zombies_v10.env()
env.reset()
for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
action = policy(observation, agent)
env.step(action)
from pettingzoo.butterfly import knights_archers_zombies_v10

env = knights_archers_zombies_v10.env()
env.reset()

for agent in env.agent_iter():
observation, reward, termination, truncation, info = env.last()
action = env.action_space(agent).sample() # this is where you would insert your policy
env.step(action) # execute the action in the environment
env.close()
```

For detailed usage information, see [AEC API](https://pettingzoo.farama.org/api/aec/) and [Parallel API](https://pettingzoo.farama.org/api/parallel/).