Skip to content

Commit

Permalink
Add third party environment: Interactive Connect Four (HuggingFace sp…
Browse files Browse the repository at this point in the history
…ace) (#1034)
  • Loading branch information
elliottower authored Jul 24, 2023
1 parent 5221958 commit d18ffa5
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 13 deletions.
Binary file added docs/_static/img/aec_cycle_figure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 7 additions & 4 deletions docs/api/aec.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,23 +89,26 @@ The [_Agent Environment Cycle_](https://arxiv.org/abs/2009.13051) (AEC) model wa
- Action and observation spaces which can change over time, and differ per agent (see [generated_agents](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/test/example_envs/generated_agents_env_v0.py) and [variable_env_test](https://github.com/Farama-Foundation/PettingZoo/blob/master/test/variable_env_test.py))
- Changing turn order and evolving environment dynamics (e.g., games with multiple stages, reversing turns)

In an AEC environment, agents act sequentially, receiving separate observations and rewards after each step.
This is a natural way of representing sequential games such as Chess, and is flexible enough to handle any type of game that multi-agent RL can consider.
In an AEC environment, agents act sequentially, receiving updated observations and rewards before taking an action. The environment updates after each agent's step, making it a natural way of representing sequential games such as Chess. The AEC model is flexible enough to handle any type of game that multi-agent RL can consider.

with the underlying environment updating after each agent's step. Agents receive updated observations and rewards at the beginning of their . The environment is updated after every step,
This is a natural way of representing sequential games such as Chess, and

```{figure} /_static/img/aec_cycle_figure.png
:width: 480px
:name: The AEC diagram of Chess
```

This is in contrast to the [*Partially Observable Stochastic Game*](https://en.wikipedia.org/wiki/Game_theory#Stochastic_outcomes_(and_relation_to_other_fields)) (POSG) model, represented in our [Parallel API](/api/parallel/), where agents act simultaneously and can only receive observations and rewards at the end of a cycle.
This makes it difficult to represent sequential games such as Chess, and results in race conditions--where agents choose to take actions which are mutually exclusive. This causes environment behavior to differ depending on internal resolution of agent order, resulting in hard-to-detect bugs if even a single race condition is not caught and handled by the environment (e.g., through tie-breaking).
This makes it difficult to represent sequential games, and results in race conditions--where agents choose to take actions which are mutually exclusive. This causes environment behavior to differ depending on internal resolution of agent order, resulting in hard-to-detect bugs if even a single race condition is not caught and handled by the environment (e.g., through tie-breaking).

The AEC model is similar to [*Extensive Form Games*](https://en.wikipedia.org/wiki/Extensive-form_game) (EFGs) model, used in DeepMind's [OpenSpiel](https://github.com/deepmind/open_spiel).
EFGs represent sequential games as trees, explicitly representing every possible sequence of actions as a root to leaf path in the tree.
A limitation of EFGs is that the formal definition is specific to game-theory, and only allows rewards at the end of a game, whereas in RL, learning often requires frequent rewards.

EFGs can be extended to represent stochastic games by adding a player representing the environment (e.g., [chance nodes](https://openspiel.readthedocs.io/en/latest/concepts.html#the-tree-representation) in OpenSpiel), which takes actions according to a given probability distribution. However, this requires users to manually sample and apply chance node actions whenever interacting with the environment, leaving room for user error and potential random seeding issues.
AEC environments, in contrast, handle environment dynamics internally after each agent step, resulting in a simpler mental model of the environment, and allowing for arbitrary and evolving environment dynamics (as opposed to static chance distribution).

AEC environments, in contrast, handle environment dynamics internally after each agent step, resulting in a simpler mental model of the environment, and allowing for arbitrary and evolving environment dynamics (as opposed to static chance distribution). The AEC model also more closely resembles how computer games are implemented in code, and can be thought of similar to the game loop in game programming.

For more information about the AEC model and PettingZoo's design philosophy, see [*PettingZoo: A Standard API for Multi-Agent
Reinforcement Learning*](https://arxiv.org/pdf/2009.14471.pdf).
Expand Down
23 changes: 14 additions & 9 deletions docs/environments/third_party_envs.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,12 +45,10 @@ Using [Google DeepMind](https://www.deepmind.com/)'s [MuZero](https://en.wikiped

### [CookingZoo](https://github.com/DavidRother/gym-cooking)

[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.11.2-red)]()
[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.23.0-blue)]()
[![GitHub stars](https://img.shields.io/github/stars/DavidRother/gym-cooking)]()
[![GitHub last commit](https://img.shields.io/github/last-commit/DavidRother/gym-cooking)]()

Fork of the game *Too Many Cooks*.

CookingZoo: a gym-cooking derivative to simulate a complex cooking environment.

### [Crazy-RL](https://github.com/ffelten/CrazyRL)

Expand All @@ -76,6 +74,13 @@ PettingZoo environments for classic game theory problems: [Prisoner's Dilemma](h
Modernized clone of the [Breakout](https://en.wikipedia.org/wiki/Breakout_(video_game)) arcade game, using [Unity](https://unity.com/) game engine and PettingZoo.
* Online playable game (using [Unity WebGL](https://docs.unity3d.com/2020.1/Documentation/Manual/webgl-gettingstarted.html) and [Unity ML-Agents](https://unity.com/products/machine-learning-agents)): [link](https://sethcram.weebly.com/breakout-clone.html), [tutorial](https://www.youtube.com/watch?v=zPFU30tbyKs)

### [Carla Gym](https://github.com/johnMinelli/carla-gym/)

[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.23.0-blue)]()
[![GitHub stars](https://img.shields.io/github/stars/johnMinelli/carla-gym)]()

PettingZoo interface for CARLA Autonomous Driving simulator.

### [Fanorona AEC](https://github.com/AbhijeetKrishnan/fanorona-aec)
[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.23.1-blue)]()
[![GitHub stars](https://img.shields.io/github/stars/AbhijeetKrishnan/fanorona-aec)]()
Expand All @@ -97,12 +102,13 @@ Interactive PettingZoo implementation of the [Gobblet](https://en.wikipedia.org/

Interactive PettingZoo implementation of the [Cathedral](https://en.wikipedia.org/wiki/Cathedral_(board_game)) board game.

### [Carla Gym](https://github.com/johnMinelli/carla-gym/)
### [Interactive Connect Four](https://huggingface.co/spaces/ClementBM/connectfour)
[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.22.4-blue)]()
[![HuggingFace likes](https://img.shields.io/badge/stars-_2-blue)]()

[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.23.0-blue)]()
[![GitHub stars](https://img.shields.io/github/stars/johnMinelli/carla-gym)]()
Play [Connect Four](https://pettingzoo.farama.org/environments/classic/connect_four/) in real-time against an [RLlib](https://docs.ray.io/en/latest/rllib/index.html) agent trained via self-play and PPO.
* Online game demo (using [Gradio](https://www.gradio.app/) and [HuggingFace Spaces](https://huggingface.co/docs/hub/spaces-overview)): [link](https://huggingface.co/spaces/ClementBM/connectfour), [tutorial](https://clementbm.github.io/project/2023/03/29/reinforcement-learning-connect-four-rllib.html)

PettingZoo interface for CARLA Autonomous Driving simulator.


___
Expand Down Expand Up @@ -191,7 +197,6 @@ PettingZoo environment for online multi-player game [Battlesnake](https://play.b

Environment with a simplified version of the video game *BomberMan*.


### [Galaga AI](https://github.com/SonicKurt/Galaga-AI)

[![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.15.0-red)]()
Expand Down

0 comments on commit d18ffa5

Please sign in to comment.