diff --git a/.github/ISSUE_TEMPLATE/question.yml b/.github/ISSUE_TEMPLATE/question.yml index 596c32f95..94c7feeee 100644 --- a/.github/ISSUE_TEMPLATE/question.yml +++ b/.github/ISSUE_TEMPLATE/question.yml @@ -6,13 +6,13 @@ body: - type: markdown attributes: value: > - If you have basic questions about reinforcement learning algorithms, please ask on - [r/reinforcementlearning](https://www.reddit.com/r/reinforcementlearning/) or in the - [RL Discord](https://discord.com/invite/xhfNqQv) (if you're new please use the beginners channel). - Basic questions that are not bugs or feature requests will be closed without reply, because GitHub - issues are not an appropriate venue for these. Advanced/nontrivial questions, especially in areas where + If you have basic questions about reinforcement learning algorithms, please ask on + [r/reinforcementlearning](https://www.reddit.com/r/reinforcementlearning/) or in the + [RL Discord](https://discord.com/invite/xhfNqQv) (if you're new please use the beginners channel). + Basic questions that are not bugs or feature requests will be closed without reply, because GitHub + issues are not an appropriate venue for these. Advanced/nontrivial questions, especially in areas where documentation is lacking, are very much welcome. - + - type: textarea id: question attributes: diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index fbbc609fd..5ec602be9 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -1,5 +1,24 @@ ---- +# See https://pre-commit.com for more information +# See https://pre-commit.com/hooks.html for more hooks repos: + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v4.4.0 + hooks: + - id: check-symlinks + - id: destroyed-symlinks + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-yaml + - id: check-toml + - id: check-ast + - id: check-added-large-files + - id: check-merge-conflict + - id: check-executables-have-shebangs + - id: check-shebang-scripts-are-executable + - id: detect-private-key + - id: debug-statements + - id: mixed-line-ending + args: [ "--fix=lf" ] - repo: https://github.com/python/black rev: 23.3.0 hooks: @@ -28,15 +47,10 @@ repos: - id: isort args: ["--profile", "black"] - repo: https://github.com/asottile/pyupgrade - rev: v3.3.1 + rev: v3.3.2 hooks: - id: pyupgrade args: ["--py37-plus"] - - repo: https://github.com/pre-commit/pre-commit-hooks - rev: v4.4.0 - hooks: - - id: mixed-line-ending - args: ["--fix=lf"] - repo: https://github.com/pycqa/pydocstyle rev: 6.3.0 hooks: @@ -59,3 +73,5 @@ repos: pass_filenames: false types: [python] additional_dependencies: ["pyright"] + args: + - --project=pyproject.toml diff --git a/CODE_OF_CONDUCT.rst b/CODE_OF_CONDUCT.rst index f91dd916a..069422654 100644 --- a/CODE_OF_CONDUCT.rst +++ b/CODE_OF_CONDUCT.rst @@ -65,4 +65,3 @@ Attribution ----------- This Code of Conduct is adapted from `Python's Code of Conduct `_, which is under a `Creative Commons License `_. - diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c418e3d53..a316fc683 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -58,7 +58,7 @@ Tutorials are a crucial way to help people learn how to use PettingZoo and we gr - You should make a `.md` file for each tutorial within the above directory. - Each `.md` file should have an "Environment Setup" section and a "Code" section. The title should be of the format `: `. - The Environment Setup section should reference the `requirements.txt` file you created using `literalinclude`. -- The Code section should reference the `.py` file you created using `literalinclude`. +- The Code section should reference the `.py` file you created using `literalinclude`. - `/docs/index.md` should be modified to include every new tutorial. ### Testing your tutorial diff --git a/LICENSE b/LICENSE index f8a7b14c6..66fbb0706 100644 --- a/LICENSE +++ b/LICENSE @@ -1,5 +1,5 @@ This repository is licensed as follows: -All assets in this repository are the copyright of the Farama Foundation, except +All assets in this repository are the copyright of the Farama Foundation, except where prohibited. Contributors to the repository transfer copyright of their work to the Farama Foundation. @@ -7,7 +7,7 @@ Some code in this repository has been taken from other open source projects and was originally released under the MIT or Apache 2.0 licenses, with copyright held by another party. We've attributed these authors and they retain their copyright to the extent required by law. Everything else -is owned by the Farama Foundation. The Secret Code font was also released under +is owned by the Farama Foundation. The Secret Code font was also released under the MIT license by Matthew Welch (http://www.squaregear.net/fonts/). The MIT and Apache 2.0 licenses are included below. diff --git a/README.md b/README.md index 438731200..6335b2602 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ PettingZoo includes the following families of environments: To install the base PettingZoo library: `pip install pettingzoo`. -This does not include dependencies for all families of environments (some environments can be problematic to install on certain systems). +This does not include dependencies for all families of environments (some environments can be problematic to install on certain systems). To install the dependencies for one family, use `pip install pettingzoo[atari]`, or use `pip install pettingzoo[all]` to install all dependencies. @@ -28,7 +28,7 @@ We support Python 3.7, 3.8, 3.9 and 3.10 on Linux and macOS. We will accept PRs ## Getting started -For an introduction to PettingZoo, see [Basic Usage](https://pettingzoo.farama.org/content/basic_usage/). To create a new environment, see our [Environment Creation Tutorial](https://pettingzoo.farama.org/tutorials/environmentcreation/1-project-structure/) and [Custom Environment Examples](https://pettingzoo.farama.org/content/environment_creation/). +For an introduction to PettingZoo, see [Basic Usage](https://pettingzoo.farama.org/content/basic_usage/). To create a new environment, see our [Environment Creation Tutorial](https://pettingzoo.farama.org/tutorials/environmentcreation/1-project-structure/) and [Custom Environment Examples](https://pettingzoo.farama.org/content/environment_creation/). For examples of training RL models using PettingZoo see our tutorials: * [CleanRL: Implementing PPO](https://pettingzoo.farama.org/tutorials/cleanrl/implementing_PPO/):train multiple PPO agents in the [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment. * [Tianshou: Training Agents](https://pettingzoo.farama.org/tutorials/tianshou/intermediate/): train DQN agents in the [Tic-Tac-Toe](https://pettingzoo.farama.org/environments/classic/tictactoe/) environment. diff --git a/docs/_static/img/doc_icon.svg b/docs/_static/img/doc_icon.svg index 817b0b0c7..450aba762 100644 --- a/docs/_static/img/doc_icon.svg +++ b/docs/_static/img/doc_icon.svg @@ -1 +1 @@ - \ No newline at end of file + diff --git a/docs/_static/img/environment_icon.svg b/docs/_static/img/environment_icon.svg index d9e4da411..c80db8008 100644 --- a/docs/_static/img/environment_icon.svg +++ b/docs/_static/img/environment_icon.svg @@ -1 +1 @@ - \ No newline at end of file + diff --git a/docs/_static/img/github_icon.svg b/docs/_static/img/github_icon.svg index b6f78a535..2b6e248df 100644 --- a/docs/_static/img/github_icon.svg +++ b/docs/_static/img/github_icon.svg @@ -1 +1 @@ - \ No newline at end of file + diff --git a/docs/_static/img/menu_icon.svg b/docs/_static/img/menu_icon.svg index 8ba493287..93aa25470 100644 --- a/docs/_static/img/menu_icon.svg +++ b/docs/_static/img/menu_icon.svg @@ -1 +1 @@ - \ No newline at end of file + diff --git a/docs/_static/img/tutorials/rllib-stack.svg b/docs/_static/img/tutorials/rllib-stack.svg index d126b7561..6cd1765b9 100644 --- a/docs/_static/img/tutorials/rllib-stack.svg +++ b/docs/_static/img/tutorials/rllib-stack.svg @@ -1 +1 @@ - \ No newline at end of file + diff --git a/docs/api/aec.md b/docs/api/aec.md index 0e8b9cd4c..dc591fd79 100644 --- a/docs/api/aec.md +++ b/docs/api/aec.md @@ -24,20 +24,20 @@ env.reset(seed=42) for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() - + if termination or truncation: action = None - else: + else: action = env.action_space(agent).sample() # this is where you would insert your policy - - env.step(action) + + env.step(action) env.close() ``` ### Action Masking -AEC environments often include action masks, in order to mark valid/invalid actions for the agent. +AEC environments often include action masks, in order to mark valid/invalid actions for the agent. -To sample actions using action masking: +To sample actions using action masking: ```python from pettingzoo.classic import chess_v6 @@ -49,17 +49,17 @@ for agent in env.agent_iter(): if termination or truncation: action = None - else: + else: # invalid action masking is optional and environment-dependent if "action_mask" in info: mask = info["action_mask"] elif isinstance(observation, dict) and "action_mask" in observation: mask = observation["action_mask"] else: - mask = None + mask = None action = env.action_space(agent).sample(mask) # this is where you would insert your policy - - env.step(action) + + env.step(action) env.close() ``` @@ -68,7 +68,7 @@ Note: action masking is optional, and can be implemented using either `observati * [PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) environments store action masks in the `observation` dict: * `mask = observation["action_mask"]` * [Shimmy](https://shimmy.farama.org/)'s [OpenSpiel environments](https://shimmy.farama.org/environments/open_spiel/) stores action masks in the `info` dict: - * `mask = info["action_mask"]` + * `mask = info["action_mask"]` To implement action masking in a custom environment, see [Environment Creation: Action Masking](https://pettingzoo.farama.org/tutorials/environmentcreation/3-action-masking/) @@ -158,4 +158,3 @@ For more information on action masking, see [A Closer Look at Invalid Action Mas .. automethod:: AECEnv.close ``` - diff --git a/docs/api/parallel.md b/docs/api/parallel.md index dfdd4cacc..7352ea7fe 100644 --- a/docs/api/parallel.md +++ b/docs/api/parallel.md @@ -22,8 +22,8 @@ observations = parallel_env.reset(seed=42) while env.agents: # this is where you would insert your policy - actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents} - + actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents} + observations, rewards, terminations, truncations, infos = parallel_env.step(actions) env.close() ``` diff --git a/docs/api/wrappers.md b/docs/api/wrappers.md index 63668e80e..26f2d7291 100644 --- a/docs/api/wrappers.md +++ b/docs/api/wrappers.md @@ -6,7 +6,7 @@ title: Wrapper ## Using Wrappers -A wrapper is an environment transformation that takes in an environment as input, and outputs a new environment that is similar to the input environment, but with some transformation or validation applied. +A wrapper is an environment transformation that takes in an environment as input, and outputs a new environment that is similar to the input environment, but with some transformation or validation applied. The following wrappers can be used with PettingZoo environments: @@ -16,7 +16,7 @@ The following wrappers can be used with PettingZoo environments: [Supersuit Wrappers](/api/wrappers/supersuit_wrappers/) include commonly used pre-processing functions such as frame-stacking and color reduction, compatible with both PettingZoo and Gymnasium. -[Shimmy Compatibility Wrappers](/api/wrappers/shimmy_wrappers/) allow commonly used external reinforcement learning environments to be used with PettingZoo and Gymnasium. +[Shimmy Compatibility Wrappers](/api/wrappers/shimmy_wrappers/) allow commonly used external reinforcement learning environments to be used with PettingZoo and Gymnasium. ```{toctree} @@ -24,4 +24,4 @@ The following wrappers can be used with PettingZoo environments: wrappers/pz_wrappers wrappers/supersuit_wrappers wrappers/shimmy_wrappers -``` \ No newline at end of file +``` diff --git a/docs/api/wrappers/pz_wrappers.md b/docs/api/wrappers/pz_wrappers.md index 77e14c8ea..516b5fc6d 100644 --- a/docs/api/wrappers/pz_wrappers.md +++ b/docs/api/wrappers/pz_wrappers.md @@ -4,7 +4,7 @@ title: PettingZoo Wrappers # PettingZoo Wrappers -PettingZoo includes the following types of wrappers: +PettingZoo includes the following types of wrappers: * [Conversion Wrappers](#conversion-wrappers): wrappers for converting environments between the [AEC](/api/aec/) and [Parallel](/api/parallel/) APIs * [Utility Wrappers](#utility-wrappers): a set of wrappers which provide convenient reusable logic, such as enforcing turn order or clipping out-of-bounds actions. @@ -105,4 +105,4 @@ Note: Most AEC environments include TerminateIllegalWrapper in their initializat .. autoclass:: ClipOutOfBoundsWrapper .. autoclass:: OrderEnforcingWrapper -``` \ No newline at end of file +``` diff --git a/docs/api/wrappers/supersuit_wrappers.md b/docs/api/wrappers/supersuit_wrappers.md index 92e76d8da..0e2d027a4 100644 --- a/docs/api/wrappers/supersuit_wrappers.md +++ b/docs/api/wrappers/supersuit_wrappers.md @@ -4,7 +4,7 @@ title: Supersuit Wrappers # Supersuit Wrappers -The [SuperSuit](https://github.com/Farama-Foundation/SuperSuit) companion package (`pip install supersuit`) includes a collection of pre-processing functions which can applied to both [AEC](/api/aec/) and [Parallel](/api/parallel/) environments. +The [SuperSuit](https://github.com/Farama-Foundation/SuperSuit) companion package (`pip install supersuit`) includes a collection of pre-processing functions which can applied to both [AEC](/api/aec/) and [Parallel](/api/parallel/) environments. To convert [space invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) to a greyscale observation space and stack the last 4 frames: diff --git a/docs/content/basic_usage.md b/docs/content/basic_usage.md index 3a022f9e3..8a2aae6f6 100644 --- a/docs/content/basic_usage.md +++ b/docs/content/basic_usage.md @@ -7,7 +7,7 @@ title: API To install the base PettingZoo library: `pip install pettingzoo`. -This does not include dependencies for all families of environments (some environments can be problematic to install on certain systems). +This does not include dependencies for all families of environments (some environments can be problematic to install on certain systems). To install the dependencies for one family, use `pip install pettingzoo[atari]`, or use `pip install pettingzoo[all]` to install all dependencies. diff --git a/docs/environments/atari.md b/docs/environments/atari.md index 55351200f..4360743ea 100644 --- a/docs/environments/atari.md +++ b/docs/environments/atari.md @@ -66,8 +66,8 @@ for agent in env.agent_iter(): action = None else: action = env.action_space(agent).sample() # this is where you would insert your policy - - env.step(action) + + env.step(action) env.close() ``` diff --git a/docs/environments/butterfly.md b/docs/environments/butterfly.md index 443daf850..04ed71880 100644 --- a/docs/environments/butterfly.md +++ b/docs/environments/butterfly.md @@ -16,7 +16,7 @@ butterfly/pistonball :file: butterfly/list.html ``` -Butterfly environments are challenging scenarios created by Farama, using Pygame with visual Atari spaces. +Butterfly environments are challenging scenarios created by Farama, using Pygame with visual Atari spaces. All environments require a high degree of coordination and require learning of emergent behaviors to achieve an optimal policy. As such, these environments are currently very challenging to learn. @@ -25,7 +25,7 @@ Environments are highly configurable via arguments specified in their respective [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/), [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/). -### Installation +### Installation The unique dependencies for this set of environments can be installed via: ````bash @@ -43,7 +43,7 @@ observations = env.reset() while env.agents: # this is where you would insert your policy - actions = {agent: env.action_space(agent).sample() for agent in env.agents} + actions = {agent: env.action_space(agent).sample() for agent in env.agents} observations, rewards, terminations, truncations, infos = env.step(actions) env.close() @@ -63,7 +63,7 @@ manual_policy = knights_archers_zombies_v10.ManualPolicy(env) for agent in env.agent_iter(): clock.tick(env.metadata["render_fps"]) observation, reward, termination, truncation, info = env.last() - + if agent == manual_policy.agent: # get user input (controls are WASD and space) action = manual_policy(observation, agent) @@ -71,7 +71,6 @@ for agent in env.agent_iter(): # this is where you would insert your policy (for non-player agents) action = env.action_space(agent).sample() - env.step(action) + env.step(action) env.close() ``` - diff --git a/docs/environments/classic.md b/docs/environments/classic.md index 4905e9264..112c6146f 100644 --- a/docs/environments/classic.md +++ b/docs/environments/classic.md @@ -23,7 +23,7 @@ classic/tictactoe :file: classic/list.html ``` -Classic environments represent implementations of popular turn-based human games and are mostly competitive. +Classic environments represent implementations of popular turn-based human games and are mostly competitive. ### Installation @@ -45,14 +45,14 @@ env.reset(seed=42) for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() - + if termination or truncation: break - + mask = observation["action_mask"] action = env.action_space(agent).sample(mask) # this is where you would insert your policy - - env.step(action) + + env.step(action) env.close() ``` diff --git a/docs/environments/mpe.md b/docs/environments/mpe.md index 0daf085f0..85bc45f55 100644 --- a/docs/environments/mpe.md +++ b/docs/environments/mpe.md @@ -43,13 +43,13 @@ env = simple_tag_v3.env(render_mode='human') env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() - + if termination or truncation: action = None else: action = env.action_space(agent).sample() # this is where you would insert your policy - - env.step(action) + + env.step(action) env.close() ``` diff --git a/docs/environments/sisl.md b/docs/environments/sisl.md index f73d610bb..4280a0fea 100644 --- a/docs/environments/sisl.md +++ b/docs/environments/sisl.md @@ -36,13 +36,13 @@ env = waterworld_v4.env(render_mode='human') env.reset() for agent in env.agent_iter(): observation, reward, termination, truncation, info = env.last() - + if termination or truncation: action = None else: action = env.action_space(agent).sample() # this is where you would insert your policy - - env.step(action) + + env.step(action) env.close() ``` diff --git a/docs/environments/third_party_envs.md b/docs/environments/third_party_envs.md index 3748a8e90..0d7712aaf 100644 --- a/docs/environments/third_party_envs.md +++ b/docs/environments/third_party_envs.md @@ -17,7 +17,7 @@ lastpage: [![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.22.2-blue)]() [![GitHub stars](https://img.shields.io/github/stars/LucasAlegre/sumo-rl)]() -PettingZoo (and Gymnasium) wrappers for the widely used [SUMO](https://github.com/eclipse/sumo) traffic simulation. +PettingZoo (and Gymnasium) wrappers for the widely used [SUMO](https://github.com/eclipse/sumo) traffic simulation. ### [POGEMA](https://github.com/AIRI-Institute/pogema) @@ -49,7 +49,7 @@ Using [Google DeepMind](https://www.deepmind.com/)'s [MuZero](https://en.wikiped [![GitHub stars](https://img.shields.io/github/stars/DavidRother/gym-cooking)]() [![GitHub last commit](https://img.shields.io/github/last-commit/DavidRother/gym-cooking)]() -Fork of the game *Too Many Cooks*. +Fork of the game *Too Many Cooks*. ### [Crazy-RL](https://github.com/ffelten/CrazyRL) @@ -57,7 +57,7 @@ Fork of the game *Too Many Cooks*. [![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.22.3-blue)]() [![GitHub stars](https://img.shields.io/github/stars/ffelten/CrazyRL)]() -A library for doing reinforcement learning using [Crazyflie](https://www.bitcraze.io/products/crazyflie-2-1/) drones. +A library for doing reinforcement learning using [Crazyflie](https://www.bitcraze.io/products/crazyflie-2-1/) drones. ### [PettingZoo Dilemma Envs](https://github.com/tianyu-z/pettingzoo_dilemma_envs) @@ -137,8 +137,8 @@ Environments for [Kaggle](https://www.kaggle.com/) machine learning challenges. [![PettingZoo version dependency](https://img.shields.io/badge/PettingZoo-v1.18.0-red)]() [![GitHub stars](https://img.shields.io/github/stars/cogment/cogment-verse)]() [![GitHub last commit](https://img.shields.io/github/last-commit/cogment/cogment-verse)]() - -Library of Environments, Human Actor UIs and Agent implementation for Human In the Loop Learning & Reinforcement Learning. + +Library of Environments, Human Actor UIs and Agent implementation for Human In the Loop Learning & Reinforcement Learning. ### [Stone Ground Hearth Battles](https://github.com/JDBumgardner/stone_ground_hearth_battles) @@ -156,7 +156,7 @@ Simulator and environments for [Blizzard](https://www.blizzard.com/en-us/)'s pop [![GitHub stars](https://img.shields.io/github/stars/cage-challenge/CybORG)]() [![GitHub last commit](https://img.shields.io/github/last-commit/cage-challenge/CybORG)]() -A cyber-security research environment for training and development of security human and autonomous agents. +A cyber-security research environment for training and development of security human and autonomous agents. ### [conflict_rez](https://github.com/XuShenLZ/conflict_rez) @@ -183,7 +183,7 @@ PettingZoo environment for online multi-player game [Battlesnake](https://play.b [![GitHub stars](https://img.shields.io/github/stars/NaIwo/BomberManAI)]() [![GitHub last commit](https://img.shields.io/github/last-commit/NaIwo/BomberManAI)]() -Environment with a simplified version of the video game *BomberMan*. +Environment with a simplified version of the video game *BomberMan*. ### [Fanorona AEC](https://github.com/AbhijeetKrishnan/fanorona-aec) @@ -191,7 +191,7 @@ Environment with a simplified version of the video game *BomberMan*. [![GitHub stars](https://img.shields.io/github/stars/AbhijeetKrishnan/fanorona-aec)]() [![GitHub last commit](https://img.shields.io/github/last-commit/AbhijeetKrishnan/fanorona-aec)]() -Implementation of the board game *Fanorona*. +Implementation of the board game *Fanorona*. ### [Galaga AI](https://github.com/SonicKurt/Galaga-AI) @@ -209,7 +209,7 @@ Implementation of the [Galaga](https://en.wikipedia.org/wiki/Galaga) arcade game [![GitHub stars](https://img.shields.io/github/stars/michaelfeil/skyjo_rl)]() [![GitHub last commit](https://img.shields.io/github/last-commit/michaelfeil/skyjo_rl)]() -Implementation of the board game *SkyJo*. +Implementation of the board game *SkyJo*. ### [Mu Torere](https://github.com/Aroksak/MuTorere) @@ -218,7 +218,7 @@ Implementation of the board game *SkyJo*. [![GitHub stars](https://img.shields.io/github/stars/Aroksak/MuTorere)]() [![GitHub last commit](https://img.shields.io/github/last-commit/DaBultz/pz-battlesnake)]() -Implementation of the board game *Mū tōrere* from New Zealand. +Implementation of the board game *Mū tōrere* from New Zealand. ___ diff --git a/docs/release_notes/index.md b/docs/release_notes/index.md index aa442ee02..2e85cd3ee 100644 --- a/docs/release_notes/index.md +++ b/docs/release_notes/index.md @@ -5,4 +5,4 @@ :github: https://github.com/Farama-Foundation/PettingZoo/releases :pypi: https://pypi.org/project/pettingzoo/ :changelog-url: -``` \ No newline at end of file +``` diff --git a/docs/tutorials/cleanrl/index.md b/docs/tutorials/cleanrl/index.md index 9503cc664..15d9ca61b 100644 --- a/docs/tutorials/cleanrl/index.md +++ b/docs/tutorials/cleanrl/index.md @@ -4,17 +4,17 @@ title: "CleanRL" # CleanRL Tutorial -This tutorial shows how to use [CleanRL](https://github.com/vwxyzjn/cleanrl) to implement a model and train it on a PettingZoo environment. +This tutorial shows how to use [CleanRL](https://github.com/vwxyzjn/cleanrl) to implement a model and train it on a PettingZoo environment. * [Implementing PPO](/tutorials/cleanrl/implementing_PPO.md): _Implement and train a PPO model_ ## CleanRL Overview -[CleanRL](https://github.com/vwxyzjn/cleanrl) is a lightweight, highly-modularized reinforcement learning library, providing high-quality single-file implementations with research-friendly features. +[CleanRL](https://github.com/vwxyzjn/cleanrl) is a lightweight, highly-modularized reinforcement learning library, providing high-quality single-file implementations with research-friendly features. -See the [documentation](https://docs.cleanrl.dev/) for more information. +See the [documentation](https://docs.cleanrl.dev/) for more information. ## Official examples using PettingZoo: @@ -23,7 +23,7 @@ See the [documentation](https://docs.cleanrl.dev/) for more information. ## WandB Integration -A key feature is its tight integration with [Weights & Biases](https://wandb.ai/) (WandB): for experiment tracking, hyperparameter tuning, and benchmarking. +A key feature is its tight integration with [Weights & Biases](https://wandb.ai/) (WandB): for experiment tracking, hyperparameter tuning, and benchmarking. The [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark) allows users to view public leaderboards for many tasks, including videos of agents' performance across training timesteps. diff --git a/docs/tutorials/environmentcreation/1-project-structure.md b/docs/tutorials/environmentcreation/1-project-structure.md index 32c39f406..0b22a8f59 100644 --- a/docs/tutorials/environmentcreation/1-project-structure.md +++ b/docs/tutorials/environmentcreation/1-project-structure.md @@ -29,11 +29,11 @@ Environment repositories are usually laid out using the following structure: - `/requirements.txt` is a file used to keep track of your environment dependencies. At the very least, `pettingzoo` should be in there. **Please version control all your dependencies via `==`**. ### Advanced: Additional (optional) files -The above file structure is minimal. A more deployment-ready environment would include -- `/docs/` for documentation, -- `/setup.py` for packaging, -- `/custom-environment/__init__.py` for depreciation handling, and -- Github actions for continuous integration of environment tests. +The above file structure is minimal. A more deployment-ready environment would include +- `/docs/` for documentation, +- `/setup.py` for packaging, +- `/custom-environment/__init__.py` for depreciation handling, and +- Github actions for continuous integration of environment tests. Implementing these are outside the scope of this tutorial. diff --git a/docs/tutorials/environmentcreation/2-environment-logic.md b/docs/tutorials/environmentcreation/2-environment-logic.md index 29b66d283..f917b6cd8 100644 --- a/docs/tutorials/environmentcreation/2-environment-logic.md +++ b/docs/tutorials/environmentcreation/2-environment-logic.md @@ -11,7 +11,7 @@ Now that we have a basic understanding of the structure of environment repositor For this tutorial, we will be creating a two-player game consisting of a prisoner, trying to escape, and a guard, trying to catch the prisoner. This game will be played on a 7x7 grid, where: - The prisoner starts in the top left corner, - the guard starts in the bottom right corner, -- the escape door is randomly placed in the middle of the grid, and +- the escape door is randomly placed in the middle of the grid, and - Both the prisoner and the guard can move in any of the four cardinal directions (up, down, left, right). ## Code diff --git a/docs/tutorials/environmentcreation/4-testing-your-environment.md b/docs/tutorials/environmentcreation/4-testing-your-environment.md index b453ff49e..a9dc6c65e 100644 --- a/docs/tutorials/environmentcreation/4-testing-your-environment.md +++ b/docs/tutorials/environmentcreation/4-testing-your-environment.md @@ -10,7 +10,7 @@ Now that our environment is complete, we can test it to make sure it works as in ## Code -Note: This code can be added to the bottom of the same file, without using any imports, but it is best practice to keep tests in a separate file, and use modular imports, as shown below.. +Note: This code can be added to the bottom of the same file, without using any imports, but it is best practice to keep tests in a separate file, and use modular imports, as shown below.. Relative importing is used for simplicity, and assumes your custom environment is in the same directory. If your test is in another location (e.g., a root-level `/test/` directory), it is recommended to import using absolute path. diff --git a/docs/tutorials/environmentcreation/index.md b/docs/tutorials/environmentcreation/index.md index 69b049471..36d0a1799 100644 --- a/docs/tutorials/environmentcreation/index.md +++ b/docs/tutorials/environmentcreation/index.md @@ -4,7 +4,7 @@ title: "Environment Creation" # Environment Creation Tutorial -These tutorials walk you though creating a custom environment from scratch, and are recommended as a starting point for anyone new to PettingZoo. +These tutorials walk you though creating a custom environment from scratch, and are recommended as a starting point for anyone new to PettingZoo. 1. [Project Structure](/tutorials/environmentcreation/1-project-structure.md) diff --git a/docs/tutorials/langchain/langchain.md b/docs/tutorials/langchain/langchain.md index 87b78c8be..8955c966e 100644 --- a/docs/tutorials/langchain/langchain.md +++ b/docs/tutorials/langchain/langchain.md @@ -57,7 +57,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 1 Observation: 3 @@ -65,7 +65,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 1 Observation: 1 @@ -73,7 +73,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 2 Observation: 1 @@ -81,7 +81,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 1 Observation: 1 @@ -89,7 +89,7 @@ Reward: 1 Termination: False Truncation: False Return: 1 - + Action: 0 Observation: 2 @@ -97,7 +97,7 @@ Reward: -1 Termination: False Truncation: False Return: -1 - + Action: 0 Observation: 0 @@ -105,7 +105,7 @@ Reward: 0 Termination: False Truncation: True Return: 1 - + Action: None Observation: 0 @@ -113,7 +113,7 @@ Reward: 0 Termination: False Truncation: True Return: -1 - + Action: None ``` @@ -149,17 +149,17 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 0 - | | - X | - | - + | | + X | - | - _____|_____|_____ - | | - - | - | - + | | + - | - | - _____|_____|_____ - | | - - | - | - - | | + | | + - | - | - + | | Observation: {'observation': array([[[0, 1], [0, 0], @@ -176,17 +176,17 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 1 - | | - X | - | - + | | + X | - | - _____|_____|_____ - | | - O | - | - + | | + O | - | - _____|_____|_____ - | | - - | - | - - | | + | | + - | - | - + | | Observation: {'observation': array([[[1, 0], [0, 1], @@ -203,17 +203,17 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 2 - | | - X | - | - + | | + X | - | - _____|_____|_____ - | | - O | - | - + | | + O | - | - _____|_____|_____ - | | - X | - | - - | | + | | + X | - | - + | | Observation: {'observation': array([[[0, 1], [1, 0], @@ -230,17 +230,17 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 3 - | | - X | O | - + | | + X | O | - _____|_____|_____ - | | - O | - | - + | | + O | - | - _____|_____|_____ - | | - X | - | - - | | + | | + X | - | - + | | Observation: {'observation': array([[[1, 0], [0, 1], @@ -257,17 +257,17 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 4 - | | - X | O | - + | | + X | O | - _____|_____|_____ - | | - O | X | - + | | + O | X | - _____|_____|_____ - | | - X | - | - - | | + | | + X | - | - + | | Observation: {'observation': array([[[0, 1], [1, 0], @@ -284,17 +284,17 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 5 - | | - X | O | - + | | + X | O | - _____|_____|_____ - | | - O | X | - + | | + O | X | - _____|_____|_____ - | | - X | O | - - | | + | | + X | O | - + | | Observation: {'observation': array([[[1, 0], [0, 1], @@ -311,17 +311,17 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 6 - | | - X | O | X + | | + X | O | X _____|_____|_____ - | | - O | X | - + | | + O | X | - _____|_____|_____ - | | - X | O | - - | | + | | + X | O | - + | | Observation: {'observation': array([[[0, 1], [1, 0], @@ -338,7 +338,7 @@ Reward: -1 Termination: True Truncation: False Return: -1 - + Action: None Observation: {'observation': array([[[1, 0], @@ -356,7 +356,7 @@ Reward: 1 Termination: True Truncation: False Return: 1 - + Action: None ``` @@ -368,7 +368,7 @@ Here is an example of a Texas Hold'em No Limit game that uses the `ActionMaskAge :language: python ``` -```text +```text Observation: {'observation': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., @@ -377,7 +377,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 1 Observation: {'observation': array([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -388,7 +388,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 1 Observation: {'observation': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -399,7 +399,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 1 Observation: {'observation': array([0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., @@ -410,7 +410,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 0 Observation: {'observation': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -421,7 +421,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 2 Observation: {'observation': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -432,7 +432,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 2 Observation: {'observation': array([0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -443,7 +443,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 3 Observation: {'observation': array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -455,7 +455,7 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 4 Observation: {'observation': array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -468,9 +468,9 @@ Reward: 0 Termination: False Truncation: False Return: 0 - + Action: 4 -[WARNING]: Illegal move made, game terminating with current player losing. +[WARNING]: Illegal move made, game terminating with current player losing. obs['action_mask'] contains a mask of all legal moves that can be chosen. Observation: {'observation': array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -483,7 +483,7 @@ Reward: -1.0 Termination: True Truncation: True Return: -1.0 - + Action: None Observation: {'observation': array([ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., @@ -496,7 +496,7 @@ Reward: 0 Termination: True Truncation: True Return: 0 - + Action: None Observation: {'observation': array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., @@ -509,7 +509,7 @@ Reward: 0 Termination: True Truncation: True Return: 0 - + Action: None Observation: {'observation': array([ 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., @@ -522,7 +522,7 @@ Reward: 0 Termination: True Truncation: True Return: 0 - + Action: None ``` diff --git a/docs/tutorials/rllib/index.md b/docs/tutorials/rllib/index.md index 0ba9f5430..dc95d2b52 100644 --- a/docs/tutorials/rllib/index.md +++ b/docs/tutorials/rllib/index.md @@ -25,8 +25,8 @@ See the [documentation](https://docs.ray.io/en/latest/rllib/index.html) for more * [simple multi-agent: rock-paper-scissors](https://github.com/ray-project/ray/blob/master/rllib/examples/rock_paper_scissors_multiagent.py) * [multi-agent parameter sharing: waterworld](https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py) * [multi-agent independent learning: waterworld](https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_independent_learning.py) - * [multi-agent leela chess zero](https://github.com/ray-project/ray/blob/master/rllib/examples/multi-agent-leela-chess-zero.py) - * [PR: connect four self-play with pettingzoo](https://github.com/ray-project/ray/pull/33481) + * [multi-agent leela chess zero](https://github.com/ray-project/ray/blob/master/rllib/examples/multi-agent-leela-chess-zero.py) + * [PR: connect four self-play with pettingzoo](https://github.com/ray-project/ray/pull/33481) [//]: # (TODO: test waterworld, leela chess zero, add PR to pettingzoo if it isn't merged) diff --git a/docs/tutorials/tianshou/advanced.md b/docs/tutorials/tianshou/advanced.md index 53cc91329..d6452ff21 100644 --- a/docs/tutorials/tianshou/advanced.md +++ b/docs/tutorials/tianshou/advanced.md @@ -6,7 +6,7 @@ title: "Tianshou: CLI and Logging" This tutorial is a full example using Tianshou to train a [Deep Q-Network](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) (DQN) on the [Tic-Tac-Toe](https://pettingzoo.farama.org/environments/classic/tictactoe/) environment. -It extends the code from [Training Agents](https://pettingzoo.farama.org/tutorials/tianshou/intermediate/) to add CLI (using [argparse](https://docs.python.org/3/library/argparse.html)) and logging (using Tianshou's [Logger](https://tianshou.readthedocs.io/en/master/tutorials/logger.html)). +It extends the code from [Training Agents](https://pettingzoo.farama.org/tutorials/tianshou/intermediate/) to add CLI (using [argparse](https://docs.python.org/3/library/argparse.html)) and logging (using Tianshou's [Logger](https://tianshou.readthedocs.io/en/master/tutorials/logger.html)). ## Environment Setup diff --git a/docs/tutorials/tianshou/beginner.md b/docs/tutorials/tianshou/beginner.md index 0b994eba3..7f06839a3 100644 --- a/docs/tutorials/tianshou/beginner.md +++ b/docs/tutorials/tianshou/beginner.md @@ -4,7 +4,7 @@ title: "Tianshou: Basic API Usage" # Tianshou: Basic API Usage -This tutorial is a simple example of how to use [Tianshou](https://github.com/thu-ml/tianshou) with a PettingZoo environment. +This tutorial is a simple example of how to use [Tianshou](https://github.com/thu-ml/tianshou) with a PettingZoo environment. It demonstrates a game betwenen two [random policy](https://tianshou.readthedocs.io/en/master/_modules/tianshou/policy/random.html) agents in the [rock-paper-scissors](https://pettingzoo.farama.org/environments/classic/rps/) environment. diff --git a/docs/tutorials/tianshou/index.md b/docs/tutorials/tianshou/index.md index 00a3dac95..eef3a7d0c 100644 --- a/docs/tutorials/tianshou/index.md +++ b/docs/tutorials/tianshou/index.md @@ -4,9 +4,9 @@ title: "Tianshou" # Tianshou Tutorial -These tutorials provide an introduction to using [Tianshou](https://github.com/thu-ml/tianshou) with PettingZoo. +These tutorials provide an introduction to using [Tianshou](https://github.com/thu-ml/tianshou) with PettingZoo. -* [Basic API Usage](/tutorials/tianshou/beginner/): _View a game between random agents_ +* [Basic API Usage](/tutorials/tianshou/beginner/): _View a game between random agents_ * [Training Agents](/tutorials/tianshou/intermediate): _Train a DQN agent_ @@ -14,8 +14,8 @@ These tutorials provide an introduction to using [Tianshou](https://github.com/t ## Tianshou Overview -[Tianshou](https://github.com/thu-ml/tianshou) is a lightweight reinforcement learning platform providing fast-speed, modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. -It uses pure [PyTorch](https://pytorch.org/) and is written in only ~4000 lines of code. +[Tianshou](https://github.com/thu-ml/tianshou) is a lightweight reinforcement learning platform providing fast-speed, modularized framework and pythonic API for building the deep reinforcement learning agent with the least number of lines of code. +It uses pure [PyTorch](https://pytorch.org/) and is written in only ~4000 lines of code. It boasts a large number of algorithms and high quality software engineering standards: thorough testing, type hints, and comprehensive documentation. diff --git a/docs/tutorials/tianshou/intermediate.md b/docs/tutorials/tianshou/intermediate.md index 8fc78cd20..eaca626b3 100644 --- a/docs/tutorials/tianshou/intermediate.md +++ b/docs/tutorials/tianshou/intermediate.md @@ -4,7 +4,7 @@ title: "Tianshou: Training Agents" # Tianshou: Training Agents -This tutorial shows how to use [Tianshou](https://github.com/thu-ml/tianshou) to train a [Deep Q-Network](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) (DQN) agent to play vs a [random policy](https://tianshou.readthedocs.io/en/master/_modules/tianshou/policy/random.html) agent in the [Tic-Tac-Toe](https://pettingzoo.farama.org/environments/classic/tictactoe/) environment. +This tutorial shows how to use [Tianshou](https://github.com/thu-ml/tianshou) to train a [Deep Q-Network](https://tianshou.readthedocs.io/en/master/tutorials/dqn.html) (DQN) agent to play vs a [random policy](https://tianshou.readthedocs.io/en/master/_modules/tianshou/policy/random.html) agent in the [Tic-Tac-Toe](https://pettingzoo.farama.org/environments/classic/tictactoe/) environment. ## Environment Setup To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts. diff --git a/pettingzoo/sisl/multiwalker/multiwalker.py b/pettingzoo/sisl/multiwalker/multiwalker.py old mode 100755 new mode 100644 diff --git a/pettingzoo/sisl/pursuit/pursuit.py b/pettingzoo/sisl/pursuit/pursuit.py old mode 100755 new mode 100644 diff --git a/pettingzoo/sisl/pursuit/pursuit_base.py b/pettingzoo/sisl/pursuit/pursuit_base.py old mode 100755 new mode 100644 diff --git a/pettingzoo/sisl/waterworld/waterworld.py b/pettingzoo/sisl/waterworld/waterworld.py old mode 100755 new mode 100644 diff --git a/pettingzoo/sisl/waterworld/waterworld_base.py b/pettingzoo/sisl/waterworld/waterworld_base.py old mode 100755 new mode 100644 diff --git a/pettingzoo/utils/agent_selector.py b/pettingzoo/utils/agent_selector.py old mode 100755 new mode 100644 diff --git a/tutorials/LangChain/requirements.txt b/tutorials/LangChain/requirements.txt index 3d97c2c4d..8749f3968 100644 --- a/tutorials/LangChain/requirements.txt +++ b/tutorials/LangChain/requirements.txt @@ -1,4 +1,4 @@ pettingzoo[classic] langchain openai -tenacity \ No newline at end of file +tenacity