From 5f7d0be5e8d8f7ca426d63205824945cc8a8bd78 Mon Sep 17 00:00:00 2001 From: elliottower Date: Fri, 21 Jul 2023 19:27:37 -0400 Subject: [PATCH] Add documentation about AEC game model, link to PettingZoo paper, link to conversion wrappers --- docs/api/aec.md | 42 +++++++++++++++++++++++++++++++++++++++--- docs/api/parallel.md | 2 +- 2 files changed, 40 insertions(+), 4 deletions(-) diff --git a/docs/api/aec.md b/docs/api/aec.md index dc591fd79..8d4755f6a 100644 --- a/docs/api/aec.md +++ b/docs/api/aec.md @@ -6,11 +6,15 @@ title: AEC # AEC API -By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows it to support any type of game multi-agent RL can consider. +By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows PettingZoo to represent any type of game multi-agent RL can consider. -[PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) provides standard examples of AEC environments for turn-based games, many of which implement [Illegal Action Masking](#action-masking). +[PettingZoo Classic](/environments/classic/) provides standard examples of AEC environments for turn-based games, many of which implement [Illegal Action Masking](#action-masking). -We provide a [tutorial](https://pettingzoo.farama.org/content/environment_creation/#example-custom-environment) for creating a simple Rock-Paper-Scissors AEC environment, showing how games with simultaneous actions can also be represented with AEC environments. +We provide a [tutorial](/content/environment_creation/) for creating a simple Rock-Paper-Scissors AEC environment, showing how games with simultaneous actions can also be represented with AEC environments. + +[PettingZoo Wrappers](/api/wrappers/pz_wrappers/) can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle). + +For more information, see [About AEC](#about-aec) or [*PettingZoo: A Standard API for Multi-Agent Reinforcement Learning*](https://arxiv.org/pdf/2009.14471.pdf). ## Usage @@ -75,6 +79,38 @@ To implement action masking in a custom environment, see [Environment Creation: For more information on action masking, see [A Closer Look at Invalid Action Masking in Policy Gradient Algorithms](https://arxiv.org/abs/2006.14171) (Huang, 2022) +## About AEC +The [_Agent Environment Cycle_](https://arxiv.org/abs/2009.13051) (AEC) model was designed as a [Gym](https://github.com/openai/gym)-like API for MARL, supporting all possible use cases and types of environments. This includes environments with: +- Large number of agents (see [Magent2](https://magent2.farama.org/)) +- Variable number of agents (see [Knights, Archers, Zombies](/environments/butterfly/knights_archers_zombies)) +- Action and observation spaces of any type (e.g., [Box](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Box), [Discrete](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Discrete), [MultiDiscrete](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiDiscrete), [MultiBinary](https://gymnasium.farama.org/api/spaces/fundamental/#multibinary), [Text](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Text)) +- Nested action and observation spaces (e.g., [Dict](https://gymnasium.farama.org/api/spaces/composite/#dict), [Tuple](https://gymnasium.farama.org/api/spaces/composite/#tuple), [Sequence](https://gymnasium.farama.org/api/spaces/composite/#sequence), [Graph](https://gymnasium.farama.org/api/spaces/composite/#graph)) +- Support for action masking (see [Classic](/environments/classic) environments) +- Action and observation spaces which can change over time, and differ per agent (see [generated_agents](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/test/example_envs/generated_agents_env_v0.py) and [variable_env_test](https://github.com/Farama-Foundation/PettingZoo/blob/master/test/variable_env_test.py)) +- Changing turn order and evolving environment dynamics (e.g., games with multiple stages, reversing turns) + +In an AEC environment, agents act sequentially, receiving separate observations and rewards after each step. +This is a natural way of representing sequential games such as Chess, and is flexible enough to handle any type of game that multi-agent RL can consider. + +```{figure} /_static/img/aec_cycle_figure.png + :width: 480px + :name: The AEC diagram of Chess +``` + +This is in contrast to the [*Partially Observable Stochastic Game*](https://en.wikipedia.org/wiki/Game_theory#Stochastic_outcomes_(and_relation_to_other_fields)) (POSG) model, represented in our [Parallel API](/api/parallel/), where agents act simultaneously and can only receive observations and rewards at the end of a cycle. +This makes it difficult to represent sequential games such as Chess, and results in race conditions--where agents choose to take actions which are mutually exclusive. This causes environment behavior to differ depending on internal resolution of agent order, resulting in hard-to-detect bugs if even a single race condition is not caught and handled by the environment (e.g., through tie-breaking). + +The AEC model is similar to [*Extensive Form Games*](https://en.wikipedia.org/wiki/Extensive-form_game) (EFGs) model, used in DeepMind's [OpenSpiel](https://github.com/deepmind/open_spiel). +EFGs represent sequential games as trees, explicitly representing every possible sequence of actions as a root to leaf path in the tree. +A limitation of EFGs is that the formal definition is specific to game-theory, and only allows rewards at the end of a game, whereas in RL, learning often requires frequent rewards. + +EFGs can be extended to represent stochastic games by adding a player representing the environment (e.g., [chance nodes](https://openspiel.readthedocs.io/en/latest/concepts.html#the-tree-representation) in OpenSpiel), which takes actions according to a given probability distribution. However, this requires users to manually sample and apply chance node actions whenever interacting with the environment, leaving room for user error and potential random seeding issues. +AEC environments, in contrast, handle environment dynamics internally after each agent step, resulting in a simpler mental model of the environment, and allowing for arbitrary and evolving environment dynamics (as opposed to static chance distribution). + +For more information about the AEC model and PettingZoo's design philosophy, see [*PettingZoo: A Standard API for Multi-Agent +Reinforcement Learning*](https://arxiv.org/pdf/2009.14471.pdf). + + ## AECEnv ```{eval-rst} diff --git a/docs/api/parallel.md b/docs/api/parallel.md index 27d0c2a1e..335d61d7a 100644 --- a/docs/api/parallel.md +++ b/docs/api/parallel.md @@ -7,7 +7,7 @@ title: Parallel In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLlib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents. -All parallel environments can be converted into AEC environments by splitting a simultaneous turn into sequential turns, with observations only from the previous cycle. +[PettingZoo Wrappers](/api/wrappers/pz_wrappers/) can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle). We provide tutorials for creating two custom Parallel environments: [Rock-Paper-Scissors](https://pettingzoo.farama.org/content/environment_creation/#example-custom-parallel-environment), and a simple [gridworld environment](https://pettingzoo.farama.org/tutorials/environmentcreation/2-environment-logic/)