diff --git a/docs/api/aec.md b/docs/api/aec.md index d0208bdbf..6719d990d 100644 --- a/docs/api/aec.md +++ b/docs/api/aec.md @@ -3,46 +3,74 @@ title: AEC --- + # AEC API By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows it to support any type of game multi-agent RL can consider. -## Example Usage +## Usage AEC environments can be interacted with as follows: -``` python +```python from pettingzoo.classic import rps_v2 + env = rps_v2.env(render_mode="human") +env.reset(seed=42) -env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() + if termination or truncation: action = None - else: - action = env.action_space(agent).sample() # this is where you would insert your policy - env.step(action) + else: + action = env.action_space(agent).sample() # this is where you would insert your policy + + env.step(action) env.close() ``` -Note: for environments with illegal actions in the action space, actions can be sampled according to an action mask as follows: -``` python +### Action Masking +AEC environments often include action masks, in order to mark valid/invalid actions for the agent. + +To sample actions using action masking: +```python from pettingzoo.classic import chess_v5 + env = chess_v5.env(render_mode="human") +env.reset(seed=42) -env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() + if termination or truncation: action = None - else: - action = env.action_space(agent).sample(observation["action_mask"]) # this is where you would insert your policy - env.step(action) + else: + # invalid action masking is optional and environment-dependent + if "action_mask" in info: + mask = info["action_mask"] + elif isinstance(observation, dict) and "action_mask" in observation: + mask = observation["action_mask"] + else: + mask = None + action = env.action_space(agent).sample(mask) # this is where you would insert your policy + + env.step(action) env.close() - ``` +Note: action masking is optional, and can be implemented using either `observation` or `info`. + +* [PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) environments store action masks in the `observation` dict: + * `mask = observation["action_mask"]` +* [Shimmy](https://shimmy.farama.org/)'s [OpenSpiel environments](https://shimmy.farama.org/environments/open_spiel/) stores action masks in the `info` dict: + * `mask = info["action_mask"]` + +To implement action masking in a custom environment, see [Environment Creation: Action Masking](https://pettingzoo.farama.org/tutorials/environmentcreation/3-action-masking/) + +For more information on action masking, see [A Closer Look at Invalid Action Masking in Policy Gradient Algorithms](https://arxiv.org/abs/2006.14171) (Huang, 2022) + + ## AECEnv ```{eval-rst} diff --git a/docs/api/parallel.md b/docs/api/parallel.md index 0c9882dcf..615f839ca 100644 --- a/docs/api/parallel.md +++ b/docs/api/parallel.md @@ -7,18 +7,21 @@ title: Parallel In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents. -## Example Usage +## Usage Parallel environments can be interacted with as follows: ``` python from pettingzoo.butterfly import pistonball_v6 -parallel_env = pistonball_v6.parallel_env() -observations = parallel_env.reset() +parallel_env = pistonball_v6.parallel_env(render_mode="human") +observations = parallel_env.reset(seed=42) while env.agents: - actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents} # this is where you would insert your policy + # this is where you would insert your policy + actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents} + observations, rewards, terminations, truncations, infos = parallel_env.step(actions) +env.close() ``` ## ParallelEnv diff --git a/docs/environments/atari.md b/docs/environments/atari.md index 34eed4dcc..55351200f 100644 --- a/docs/environments/atari.md +++ b/docs/environments/atari.md @@ -52,19 +52,22 @@ Install ROMs using [AutoROM](https://github.com/Farama-Foundation/AutoROM), or s ### Usage -To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with agents taking random actions: -``` python +To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with random agents: +```python from pettingzoo.atari import space_invaders_v2 + env = space_invaders_v2.env(render_mode="human") +env.reset(seed=42) -env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() + if termination or truncation: action = None else: - env.action_space(agent).sample() # this is where you would insert your policy - env.step(action) + action = env.action_space(agent).sample() # this is where you would insert your policy + + env.step(action) env.close() ``` diff --git a/docs/environments/butterfly.md b/docs/environments/butterfly.md index e4f9e7431..443daf850 100644 --- a/docs/environments/butterfly.md +++ b/docs/environments/butterfly.md @@ -34,38 +34,44 @@ pip install pettingzoo[butterfly] ### Usage -To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with agents taking random actions: -``` python +To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with random agents: +```python from pettingzoo.butterfly import pistonball_v6 -env = pistonball_v6.parallel_env(render_mode="human") +env = pistonball_v6.parallel_env(render_mode="human") observations = env.reset() + while env.agents: - actions = {agent: env.action_space(agent).sample() for agent in env.agents} # this is where you would insert your policy + # this is where you would insert your policy + actions = {agent: env.action_space(agent).sample() for agent in env.agents} + observations, rewards, terminations, truncations, infos = env.step(actions) env.close() ``` -To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py), controls are WASD and space): -``` python +To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py)): +```python import pygame from pettingzoo.butterfly import knights_archers_zombies_v10 env = knights_archers_zombies_v10.env(render_mode="human") -env.reset() +env.reset(seed=42) clock = pygame.time.Clock() manual_policy = knights_archers_zombies_v10.ManualPolicy(env) for agent in env.agent_iter(): clock.tick(env.metadata["render_fps"]) - observation, reward, termination, truncation, info = env.last() + if agent == manual_policy.agent: + # get user input (controls are WASD and space) action = manual_policy(observation, agent) else: + # this is where you would insert your policy (for non-player agents) action = env.action_space(agent).sample() - env.step(action) + env.step(action) +env.close() ``` diff --git a/docs/environments/classic.md b/docs/environments/classic.md index 1af3ff89f..4905e9264 100644 --- a/docs/environments/classic.md +++ b/docs/environments/classic.md @@ -36,19 +36,23 @@ pip install pettingzoo[classic] ### Usage -To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with agents taking random actions: +To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with random agents: ``` python from pettingzoo.classic import texas_holdem_v4 + env = texas_holdem_v4.env(render_mode="human") +env.reset(seed=42) -env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() + if termination or truncation: - action = None - else: - action = env.action_space(agent).sample(observation["action_mask"]) # this is where you would insert your policy - env.step(action) + break + + mask = observation["action_mask"] + action = env.action_space(agent).sample(mask) # this is where you would insert your policy + + env.step(action) env.close() ``` diff --git a/docs/environments/mpe.md b/docs/environments/mpe.md index 6460f5c23..2ad185e05 100644 --- a/docs/environments/mpe.md +++ b/docs/environments/mpe.md @@ -34,20 +34,22 @@ pip install pettingzoo[mpe] ```` ### Usage -To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with agents taking random actions: +To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with random agents: -``` python +```python from pettingzoo.mpe import simple_tag_v2 env = simple_tag_v2.env(render_mode='human') env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() + if termination or truncation: action = None else: - action = env.action_space(agent).sample() - env.step(action) + action = env.action_space(agent).sample() # this is where you would insert your policy + + env.step(action) env.close() ``` diff --git a/docs/environments/sisl.md b/docs/environments/sisl.md index adb2de431..f73d610bb 100644 --- a/docs/environments/sisl.md +++ b/docs/environments/sisl.md @@ -27,7 +27,7 @@ pip install pettingzoo[sisl] ```` ### Usage -To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with agents taking random actions: +To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with random agents: ```python from pettingzoo.sisl import waterworld_v4 @@ -36,11 +36,13 @@ env = waterworld_v4.env(render_mode='human') env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() + if termination or truncation: action = None else: - action = env.action_space(agent).sample() - env.step(action) + action = env.action_space(agent).sample() # this is where you would insert your policy + + env.step(action) env.close() ```