Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand AEC documentation significantly (comparison with EFGs/POSGs, info from PettingZoo paper) #1041

Merged
merged 1 commit into from
Jul 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 39 additions & 3 deletions docs/api/aec.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,15 @@ title: AEC

# AEC API

By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows it to support any type of game multi-agent RL can consider.
By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows PettingZoo to represent any type of game multi-agent RL can consider.

[PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) provides standard examples of AEC environments for turn-based games, many of which implement [Illegal Action Masking](#action-masking).
[PettingZoo Classic](/environments/classic/) provides standard examples of AEC environments for turn-based games, many of which implement [Illegal Action Masking](#action-masking).

We provide a [tutorial](https://pettingzoo.farama.org/content/environment_creation/#example-custom-environment) for creating a simple Rock-Paper-Scissors AEC environment, showing how games with simultaneous actions can also be represented with AEC environments.
We provide a [tutorial](/content/environment_creation/) for creating a simple Rock-Paper-Scissors AEC environment, showing how games with simultaneous actions can also be represented with AEC environments.

[PettingZoo Wrappers](/api/wrappers/pz_wrappers/) can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle).

For more information, see [About AEC](#about-aec) or [*PettingZoo: A Standard API for Multi-Agent Reinforcement Learning*](https://arxiv.org/pdf/2009.14471.pdf).

## Usage

Expand Down Expand Up @@ -75,6 +79,38 @@ To implement action masking in a custom environment, see [Environment Creation:
For more information on action masking, see [A Closer Look at Invalid Action Masking in Policy Gradient Algorithms](https://arxiv.org/abs/2006.14171) (Huang, 2022)


## About AEC
The [_Agent Environment Cycle_](https://arxiv.org/abs/2009.13051) (AEC) model was designed as a [Gym](https://github.com/openai/gym)-like API for MARL, supporting all possible use cases and types of environments. This includes environments with:
- Large number of agents (see [Magent2](https://magent2.farama.org/))
- Variable number of agents (see [Knights, Archers, Zombies](/environments/butterfly/knights_archers_zombies))
- Action and observation spaces of any type (e.g., [Box](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Box), [Discrete](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Discrete), [MultiDiscrete](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.MultiDiscrete), [MultiBinary](https://gymnasium.farama.org/api/spaces/fundamental/#multibinary), [Text](https://gymnasium.farama.org/api/spaces/fundamental/#gymnasium.spaces.Text))
- Nested action and observation spaces (e.g., [Dict](https://gymnasium.farama.org/api/spaces/composite/#dict), [Tuple](https://gymnasium.farama.org/api/spaces/composite/#tuple), [Sequence](https://gymnasium.farama.org/api/spaces/composite/#sequence), [Graph](https://gymnasium.farama.org/api/spaces/composite/#graph))
- Support for action masking (see [Classic](/environments/classic) environments)
- Action and observation spaces which can change over time, and differ per agent (see [generated_agents](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/test/example_envs/generated_agents_env_v0.py) and [variable_env_test](https://github.com/Farama-Foundation/PettingZoo/blob/master/test/variable_env_test.py))
- Changing turn order and evolving environment dynamics (e.g., games with multiple stages, reversing turns)

In an AEC environment, agents act sequentially, receiving separate observations and rewards after each step.
This is a natural way of representing sequential games such as Chess, and is flexible enough to handle any type of game that multi-agent RL can consider.

```{figure} /_static/img/aec_cycle_figure.png
:width: 480px
:name: The AEC diagram of Chess
```

This is in contrast to the [*Partially Observable Stochastic Game*](https://en.wikipedia.org/wiki/Game_theory#Stochastic_outcomes_(and_relation_to_other_fields)) (POSG) model, represented in our [Parallel API](/api/parallel/), where agents act simultaneously and can only receive observations and rewards at the end of a cycle.
This makes it difficult to represent sequential games such as Chess, and results in race conditions--where agents choose to take actions which are mutually exclusive. This causes environment behavior to differ depending on internal resolution of agent order, resulting in hard-to-detect bugs if even a single race condition is not caught and handled by the environment (e.g., through tie-breaking).

The AEC model is similar to [*Extensive Form Games*](https://en.wikipedia.org/wiki/Extensive-form_game) (EFGs) model, used in DeepMind's [OpenSpiel](https://github.com/deepmind/open_spiel).
EFGs represent sequential games as trees, explicitly representing every possible sequence of actions as a root to leaf path in the tree.
A limitation of EFGs is that the formal definition is specific to game-theory, and only allows rewards at the end of a game, whereas in RL, learning often requires frequent rewards.

EFGs can be extended to represent stochastic games by adding a player representing the environment (e.g., [chance nodes](https://openspiel.readthedocs.io/en/latest/concepts.html#the-tree-representation) in OpenSpiel), which takes actions according to a given probability distribution. However, this requires users to manually sample and apply chance node actions whenever interacting with the environment, leaving room for user error and potential random seeding issues.
AEC environments, in contrast, handle environment dynamics internally after each agent step, resulting in a simpler mental model of the environment, and allowing for arbitrary and evolving environment dynamics (as opposed to static chance distribution).

For more information about the AEC model and PettingZoo's design philosophy, see [*PettingZoo: A Standard API for Multi-Agent
Reinforcement Learning*](https://arxiv.org/pdf/2009.14471.pdf).


## AECEnv

```{eval-rst}
Expand Down
2 changes: 1 addition & 1 deletion docs/api/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ title: Parallel

In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLlib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.

All parallel environments can be converted into AEC environments by splitting a simultaneous turn into sequential turns, with observations only from the previous cycle.
[PettingZoo Wrappers](/api/wrappers/pz_wrappers/) can be used to convert between Parallel and AEC environments, with some restrictions (e.g., an AEC env must only update once at the end of each cycle).

We provide tutorials for creating two custom Parallel environments: [Rock-Paper-Scissors](https://pettingzoo.farama.org/content/environment_creation/#example-custom-parallel-environment), and a simple [gridworld environment](https://pettingzoo.farama.org/tutorials/environmentcreation/2-environment-logic/)

Expand Down