Farama-Foundation · elliottower · May 6, 2023 · Apr 23, 2023 · Apr 23, 2023 · Apr 25, 2023
diff --git a/docs/api/aec.md b/docs/api/aec.md
@@ -2,25 +2,68 @@
 
 By default, PettingZoo models games as [*Agent Environment Cycle*](https://arxiv.org/abs/2009.13051) (AEC) environments. This allows it to support any type of game multi-agent RL can consider.
 
-## Example Usage
+## Usage
 
 AEC environments can be interacted with as follows:
 
-``` python
+```python
+from pettingzoo.classic import rps_v2
+
+env = rps_v2.env(render_mode="human")
+env.reset(seed=42)
+
+for agent in env.agent_iter():
+    observation, reward, termination, truncation, info = env.last()
+
+    if termination or truncation:
+        action = None
+    else:    
+        action = env.action_space(agent).sample() # this is where you would insert your policy
+
+    env.step(action) # execute the action in the environment
+env.close()
+```
+
+### Action Masking
+AEC environments often include action masks, in order to mark valid/invalid actions for the agent. 
+
+To sample actions using action masking: 
+```python
 from pettingzoo.classic import chess_v5
+
 env = chess_v5.env(render_mode="human")
+env.reset(seed=42)
 
-env.reset()
 for agent in env.agent_iter():
     observation, reward, termination, truncation, info = env.last()
+
     if termination or truncation:
         action = None
-    else:
-        action = env.action_space(agent).sample(observation["action_mask"])  # this is where you would insert your policy
-    env.step(action)
+    else:  
+        # invalid action masking is optional and environment-dependent
+        if "action_mask" in info:
+            mask = info["action_mask"]
+        elif isinstance(observation, dict) and "action_mask" in observation:
+            mask = observation["action_mask"]
+        else:
+            mask = None 
+        action = env.action_space(agent).sample(mask) # this is where you would insert your policy
+    env.step(action) # execute the action in the environment
 env.close()
 ```
 
+Note: action masking is optional, and can be implemented using either `observation` or `info`.
+
+* [PettingZoo Classic](https://pettingzoo.farama.org/environments/classic/) environments store action masks in the `observation` dict:
+  * `mask = observation["action_mask"]`
+* [Shimmy](https://shimmy.farama.org/)'s [OpenSpiel environments](https://shimmy.farama.org/environments/open_spiel/) stores action masks in the `info` dict:
+  * `mask = info["action_mask"]` 
+
+To implement action masking in a custom environment, see [Environment Creation: Action Masking](https://pettingzoo.farama.org/tutorials/environmentcreation/3-action-masking/)
+
+For more information on action masking, see [A Closer Look at Invalid Action Masking in Policy Gradient Algorithms](https://arxiv.org/abs/2006.14171) (Huang, 2022)
+
+
 ## AECEnv
 
 ```{eval-rst}

diff --git a/docs/api/parallel.md b/docs/api/parallel.md
@@ -2,17 +2,20 @@
 
 In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.
 
-## Example Usage
+## Usage
 
 Parallel environments can be interacted with as follows:
 
 ``` python
 from pettingzoo.butterfly import pistonball_v6
-parallel_env = pistonball_v6.parallel_env()
-observations = parallel_env.reset()
+parallel_env = pistonball_v6.parallel_env(render_mode="human")
+observations = parallel_env.reset(seed=42)
 
 while env.agents:
-    actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents}  # this is where you would insert your policy
+    # this is where you would insert your policy
+    actions = {agent: parallel_env.action_space(agent).sample() for agent in parallel_env.agents}  
+
+    # execute the actions in the environment
     observations, rewards, terminations, truncations, infos = parallel_env.step(actions)
 ```
 

diff --git a/docs/environments/atari.md b/docs/environments/atari.md
@@ -52,19 +52,22 @@ Install ROMs using [AutoROM](https://github.com/Farama-Foundation/AutoROM), or s
 
 ### Usage
 
-To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with agents taking random actions:
-``` python
+To launch a [Space Invaders](https://pettingzoo.farama.org/environments/atari/space_invaders/) environment with random agents:
+```python
 from pettingzoo.atari import space_invaders_v2
+
 env = space_invaders_v2.env(render_mode="human")
+env.reset(seed=42)
 
-env.reset()
 for agent in env.agent_iter():
     observation, reward, termination, truncation, info = env.last()
+
     if termination or truncation:
         action = None
     else:
-        env.action_space(agent).sample()  # this is where you would insert your policy
-    env.step(action)
+        action = env.action_space(agent).sample() # this is where you would insert your policy
+
+    env.step(action) # execute the action in the environment
 env.close()
 ```
 

diff --git a/docs/environments/butterfly.md b/docs/environments/butterfly.md
@@ -34,38 +34,44 @@ pip install pettingzoo[butterfly]
 
 ### Usage
 
-To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with agents taking random actions:
-``` python
+To launch a [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment with random agents:
+```python
 from pettingzoo.butterfly import pistonball_v6
-env = pistonball_v6.parallel_env(render_mode="human")
 
+env = pistonball_v6.parallel_env(render_mode="human")
 observations = env.reset()
+
 while env.agents:
-    actions = {agent: env.action_space(agent).sample() for agent in env.agents}  # this is where you would insert your policy
+    # this is where you would insert your policy
+    actions = {agent: env.action_space(agent).sample() for agent in env.agents}  
+
+    # execute the actions in the environment
     observations, rewards, terminations, truncations, infos = env.step(actions)
 env.close()
 ```
 
-To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py), controls are WASD and space):
-``` python
+To launch a [Knights Archers Zombies](https://pettingzoo.farama.org/environments/butterfly/knights_archers_zombies/) environment with interactive user input (see [manual_policy.py](https://github.com/Farama-Foundation/PettingZoo/blob/master/pettingzoo/butterfly/knights_archers_zombies/manual_policy.py)):
+```python
 import pygame
 from pettingzoo.butterfly import knights_archers_zombies_v10
 
 env = knights_archers_zombies_v10.env(render_mode="human")
-env.reset()
+env.reset(seed=42)
 
 clock = pygame.time.Clock()
 manual_policy = knights_archers_zombies_v10.ManualPolicy(env)
 
 for agent in env.agent_iter():
     clock.tick(env.metadata["render_fps"])
-
     observation, reward, termination, truncation, info = env.last()
+
     if agent == manual_policy.agent:
+        # get user input (controls are WASD and space)
         action = manual_policy(observation, agent)
     else:
+        # this is where you would insert your policy (for non-player agents)
         action = env.action_space(agent).sample()
 
-    env.step(action)
+    env.step(action) # execute the action in the environment
 ```
 
diff --git a/docs/environments/classic.md b/docs/environments/classic.md
@@ -36,19 +36,23 @@ pip install pettingzoo[classic]
 
 ### Usage
 
-To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with agents taking random actions:
+To launch a [Texas Holdem](https://pettingzoo.farama.org/environments/classic/texas_holdem/) environment with random agents:
 ``` python
 from pettingzoo.classic import texas_holdem_v4
+
 env = texas_holdem_v4.env(render_mode="human")
+env.reset(seed=42)
 
-env.reset()
 for agent in env.agent_iter():
     observation, reward, termination, truncation, info = env.last()
+
     if termination or truncation:
-        action = None
-    else:
-        action = env.action_space(agent).sample(observation["action_mask"])  # this is where you would insert your policy
-    env.step(action)
+        break
+
+    mask = observation["action_mask"]
+    action = env.action_space(agent).sample(mask)  # this is where you would insert your policy
+
+    env.step(action)  # execute the action in the environment
 env.close()
 ```
 

diff --git a/docs/environments/mpe.md b/docs/environments/mpe.md
@@ -34,20 +34,22 @@ pip install pettingzoo[mpe]
 ````
 
 ### Usage
-To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with agents taking random actions:
+To launch a [Simple Tag](https://pettingzoo.farama.org/environments/mpe/simple_tag/) environment with random agents:
 
-``` python
+```python
 from pettingzoo.mpe import simple_tag_v2
 env = simple_tag_v2.env(render_mode='human')
 
 env.reset()
 for agent in env.agent_iter():
     observation, reward, termination, truncation, info = env.last()
+
     if termination or truncation:
         action = None
     else:
-        action = env.action_space(agent).sample()
-    env.step(action)
+        action = env.action_space(agent).sample() # this is where you would insert your policy
+
+    env.step(action) # execute the action in the environment
 env.close()
 ```
 

diff --git a/docs/environments/sisl.md b/docs/environments/sisl.md
@@ -27,7 +27,7 @@ pip install pettingzoo[sisl]
 ````
 
 ### Usage
-To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with agents taking random actions:
+To launch a [Waterworld](https://pettingzoo.farama.org/environments/sisl/waterworld/) environment with random agents:
 
 ```python
 from pettingzoo.sisl import waterworld_v4
@@ -36,11 +36,13 @@ env = waterworld_v4.env(render_mode='human')
 env.reset()
 for agent in env.agent_iter():
     observation, reward, termination, truncation, info = env.last()
+
     if termination or truncation:
         action = None
     else:
-        action = env.action_space(agent).sample()
-    env.step(action)
+        action = env.action_space(agent).sample() # this is where you would insert your policy
+
+    env.step(action) # execute the action in the environment
 env.close()
 ```
 

diff --git a/docs/index.md b/docs/index.md
@@ -69,11 +69,16 @@ Contribute to the Docs <https://github.com/Farama-Foundation/PettingZoo/tree/mas
 Environments can be interacted with in a manner very similar to [Gymnasium](https://gymnasium.farama.org):
 
 ```python
-  from pettingzoo.butterfly import knights_archers_zombies_v10
-  env = knights_archers_zombies_v10.env()
-  env.reset()
-  for agent in env.agent_iter():
-      observation, reward, termination, truncation, info = env.last()
-      action = policy(observation, agent)
-      env.step(action)
+from pettingzoo.butterfly import knights_archers_zombies_v10
+
+env = knights_archers_zombies_v10.env()
+env.reset()
+
+for agent in env.agent_iter():
+    observation, reward, termination, truncation, info = env.last()
+    action = env.action_space(agent).sample() # this is where you would insert your policy
+    env.step(action) # execute the action in the environment
+env.close()
 ```
+
+For detailed usage information, see [AEC API](https://pettingzoo.farama.org/api/aec/) and [Parallel API](https://pettingzoo.farama.org/api/parallel/).