diff --git a/README.md b/README.md index 53788009a..401ee05a3 100644 --- a/README.md +++ b/README.md @@ -33,14 +33,14 @@ Get started with PettingZoo by following [the PettingZoo tutorial](https://petti PettingZoo model environments as [*Agent Environment Cycle* (AEC) games](https://arxiv.org/pdf/2009.14471.pdf), in order to be able to cleanly support all types of multi-agent RL environments under one API and to minimize the potential for certain classes of common bugs. -Using environments in PettingZoo is very similar to Gym, i.e. you initialize an environment via: +Using environments in PettingZoo is very similar to Gymnasium, i.e. you initialize an environment via: ```python from pettingzoo.butterfly import pistonball_v6 env = pistonball_v6.env() ``` -Environments can be interacted with in a manner very similar to Gym: +Environments can be interacted with in a manner very similar to Gymnasium: ```python env.reset() diff --git a/docs/api/core.md b/docs/api/core.md index 95741e805..80cdaaa79 100644 --- a/docs/api/core.md +++ b/docs/api/core.md @@ -1,83 +1,91 @@ # Core API +## AECEnv + ```{eval-rst} .. currentmodule:: pettingzoo.utils.env .. autoclass:: AECEnv - .. py:attribute:: agents +``` - A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed). +### Attributes - :type: list[AgentID] - .. py:attribute:: num_agents +```{eval-rst} - The length of the agents list. +.. autoattribute:: AECEnv.agents - :type: int + A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed). - .. py:attribute:: possible_agents + :type: List[AgentID] - A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting. +.. autoattribute:: AECEnv.num_agents - :type: list[AgentID] + The length of the agents list. - .. py:attribute:: max_num_agents +.. autoattribute:: AECEnv.possible_agents - The length of the possible_agents list. + A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting. - :type: int + :type: List[AgentID] - .. py:attribute:: agent_selection +.. autoattribute:: AECEnv.max_num_agents - An attribute of the environment corresponding to the currently selected agent that an action can be taken for. + The length of the possible_agents list. - :type: AgentID +.. autoattribute:: AECEnv.agent_selection - .. py:attribute:: dones + An attribute of the environment corresponding to the currently selected agent that an action can be taken for. - A dict of the done state of every current agent at the time called, keyed by name. `last()` accesses this attribute. Note that agents can be added or removed from this dict. The returned dict looks like:: + :type: AgentID - dones = {0:[first agent done state], 1:[second agent done state] ... n-1:[nth agent done state]} +.. autoattribute:: AECEnv.dones - :type: Dict[AgentID, bool] + A dict of the done state of every current agent at the time called, keyed by name. `last()` accesses this attribute. Note that agents can be added or removed from this dict. The returned dict looks like:: - .. py:attribute:: rewards + dones = {0:[first agent done state], 1:[second agent done state] ... n-1:[nth agent done state]} - A dict of the rewards of every current agent at the time called, keyed by name. Rewards the instantaneous reward generated after the last step. Note that agents can be added or removed from this attribute. `last()` does not directly access this attribute, rather the returned reward is stored in an internal variable. The rewards structure looks like:: + :type: Dict[AgentID, bool] - {0:[first agent reward], 1:[second agent reward] ... n-1:[nth agent reward]} +.. autoattribute:: AECEnv.rewards - :type: Dict[AgentID, float] + A dict of the rewards of every current agent at the time called, keyed by name. Rewards the instantaneous reward generated after the last step. Note that agents can be added or removed from this attribute. `last()` does not directly access this attribute, rather the returned reward is stored in an internal variable. The rewards structure looks like:: - .. py:attribute:: infos + {0:[first agent reward], 1:[second agent reward] ... n-1:[nth agent reward]} - A dict of info for each current agent, keyed by name. Each agent's info is also a dict. Note that agents can be added or removed from this attribute. `last()` accesses this attribute. The returned dict looks like:: + :type: Dict[AgentID, float] - infos = {0:[first agent info], 1:[second agent info] ... n-1:[nth agent info]} +.. autoattribute:: AECEnv.infos - :type: Dict[AgentID, Dict[str, Any]] + A dict of info for each current agent, keyed by name. Each agent's info is also a dict. Note that agents can be added or removed from this attribute. `last()` accesses this attribute. The returned dict looks like:: - .. py:attribute:: observation_spaces + infos = {0:[first agent info], 1:[second agent info] ... n-1:[nth agent info]} - A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting. + :type: Dict[AgentID, Dict[str, Any]] - :type: Dict[AgentID, gym.spaces.Space] +.. autoattribute:: AECEnv.observation_spaces - .. py:attribute:: action_spaces + A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting. - A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting. + :type: Dict[AgentID, gymnasium.spaces.Space] - :type: Dict[AgentID, gym.spaces.Space] +.. autoattribute:: AECEnv.action_spaces - .. automethod:: step - .. automethod:: reset - .. automethod:: observe - .. automethod:: render - .. automethod:: seed - .. automethod:: close + A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting. + :type: Dict[AgentID, gymnasium.spaces.Space] +``` + +### Methods + +```{eval-rst} +.. automethod:: AECEnv.step +.. automethod:: AECEnv.reset +.. automethod:: AECEnv.observe +.. automethod:: AECEnv.render +.. automethod:: AECEnv.seed +.. automethod:: AECEnv.close ``` diff --git a/docs/api/parallel.md b/docs/api/parallel.md index d01c1bc56..552db0dfd 100644 --- a/docs/api/parallel.md +++ b/docs/api/parallel.md @@ -2,7 +2,7 @@ In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents. -### Example Usage +## Example Usage Environments can be interacted with as follows: @@ -15,6 +15,8 @@ for step in range(max_cycles): observations, rewards, terminations, truncations, infos = parallel_env.step(actions) ``` +## ParallelEnv + ```{eval-rst} .. currentmodule:: pettingzoo.utils.env diff --git a/docs/api/pz_wrappers.md b/docs/api/pz_wrappers.md index 419f538d2..a9145fa68 100644 --- a/docs/api/pz_wrappers.md +++ b/docs/api/pz_wrappers.md @@ -33,12 +33,17 @@ env = from_parallel(env) We wanted our pettingzoo environments to be both easy to use and easy to implement. To combine these, we have a set of simple wrappers which provide input validation and other convenient reusable logic. -* `BaseWrapper`: All AECEnv wrappers should inherit from this base class -* `TerminateIllegalWrapper`: Handles illegal move logic for classic games -* `CaptureStdoutWrapper`: Takes an environment which prints to terminal, and gives it an `ansi` render mode where it captures the terminal output and returns it as a string instead. -* `AssertOutOfBoundsWrapper`: Asserts if the action given to step is outside of the action space. Applied in PettingZoo environments with discrete action spaces. -* `ClipOutOfBoundsWrapper`: Clips the input action to fit in the continuous action space (emitting a warning if it does so). Applied to continuous environments in pettingzoo. -* `OrderEnforcingWrapper`: Gives a sensible error message if function calls or attribute access are in a disallowed order, for example if step() is called before reset(), or the .dones attribute is accessed before reset(), or if seed() is called and then step() is used before reset() is called again (reset must be called after seed()). Applied to all PettingZoo environments. +```{eval-rst} +.. currentmodule:: pettingzoo.utils.wrappers + +.. autoclass:: BaseWrapper +.. autoclass:: TerminateIllegalWrapper +.. autoclass:: CaptureStdoutWrapper +.. autoclass:: AssertOutOfBoundsWrapper +.. autoclass:: ClipOutOfBoundsWrapper +.. autoclass:: OrderEnforcingWrapper + +``` You can apply these wrappers to your environment in a similar manner to the below example: diff --git a/docs/api/supersuit_wrappers.md b/docs/api/supersuit_wrappers.md index d8a65992e..c52c74e7e 100644 --- a/docs/api/supersuit_wrappers.md +++ b/docs/api/supersuit_wrappers.md @@ -7,7 +7,7 @@ title: Supersuit Wrappers PettingZoo include wrappers via the SuperSuit companion package (`pip install supersuit`). These can be applied to both AECEnv and ParallelEnv environments. Using it to convert space invaders to have a grey scale observation space and stack the last 4 frames looks like: ``` python -import gym +import gymnasium as gym from supersuit import color_reduction_v0, frame_stack_v1 env = gym.make('SpaceInvaders-v0') @@ -28,50 +28,96 @@ env = frame_stack_v1(color_reduction_v0(env, 'full'), 4) Supersuit includes the following wrappers: -* `clip_reward_v0(env, lower_bound=-1, upper_bound=1)` clips rewards to between lower_bound and upper_bound. This is a popular way of handling rewards with significant variance of magnitude, especially in Atari environments. -* `clip_actions_v0(env)` clips Box actions to be within the high and low bounds of the action space. This is a standard transformation applied to environments with continuous action spaces to keep the action passed to the environment within the specified bounds. +```{eval-rst} +.. py:function:: clip_reward_v0(env, lower_bound=-1, upper_bound=1) -* `color_reduction_v0(env, mode='full')` simplifies color information in graphical ((x,y,3) shaped) environments. `mode='full'` fully greyscales of the observation. This can be computationally intensive. Arguments of 'R', 'G' or 'B' just take the corresponding R, G or B color channel from observation. This is much faster and is generally sufficient. + Clips rewards to between lower_bound and upper_bound. This is a popular way of handling rewards with significant variance of magnitude, especially in Atari environments. -* `dtype_v0(env, dtype)` recasts your observation as a certain dtype. Many graphical games return `uint8` observations, while neural networks generally want `float16` or `float32`. `dtype` can be anything NumPy would except as a dtype argument (e.g. np.dtype classes or strings). +.. py:function:: clip_actions_v0(env) -* `flatten_v0(env)` flattens observations into a 1D array. + Clips Box actions to be within the high and low bounds of the action space. This is a standard transformation applied to environments with continuous action spaces to keep the action passed to the environment within the specified bounds. -* `frame_skip_v0(env, num_frames)` skips `num_frames` number of frames by reapplying old actions over and over. Observations skipped over are ignored. Rewards skipped over are accumulated. Like Gymnasium Atari's frameskip parameter, `num_frames` can also be a tuple `(min_skip, max_skip)`, which indicates a range of possible skip lengths which are randomly chosen from (in single agent environments only). +.. py:function:: color_reduction_v0(env, mode='full') -* `delay_observations_v0(env, delay)` Delays observation by `delay` frames. Before `delay` frames have been executed, the observation is all zeros. Along with frame_skip, this is the preferred way to implement reaction time for high FPS games. + Simplifies color information in graphical ((x,y,3) shaped) environments. `mode='full'` fully greyscales of the observation. This can be computationally intensive. Arguments of 'R', 'G' or 'B' just take the corresponding R, G or B color channel from observation. This is much faster and is generally sufficient. -* `sticky_actions_v0(env, repeat_action_probability)` assigns a probability of an old action "sticking" to the environment and not updating as requested. This is to prevent agents from learning predefined action patterns in highly deterministic games like Atari. Note that the stickiness is cumulative, so an action has a repeat_action_probability^2 chance of an action sticking for two turns in a row, etc. This is the recommended way of adding randomness to Atari by *"Machado et al. (2018), "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents"* +.. py:function:: dtype_v0(env, dtype) -* `frame_stack_v1(env, num_frames=4)` stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. At the start of the game, frames that don't yet exist are filled with 0s. `num_frames=1` is analogous to not using this function. + Recasts your observation as a certain dtype. Many graphical games return `uint8` observations, while neural networks generally want `float16` or `float32`. `dtype` can be anything NumPy would except as a dtype argument (e.g. np.dtype classes or strings). -* `max_observation_v0(env, memory)` the resulting observation becomes the max over `memory` number of prior frames. This is important for Atari environments, as many games have elements that are intermitently flashed on the instead of being constant, due to the peculiarities of the console and CRT TVs. The OpenAI baselines MaxAndSkip Atari wrapper is equivalent to doing `memory=2` and then a `frame_skip` of 4. +.. py:function:: flatten_v0(env) -* `normalize_obs_v0(env, env_min=0, env_max=1)` linearly scales observations to the range `env_min` (default 0) to `env_max` (default 1), given the known minimum and maximum observation values defined in the observation space. Only works on Box observations with float32 or float64 dtypes and finite bounds. If you wish to normalize another type, you can first apply the dtype wrapper to convert your type to float32 or float64. + flattens observations into a 1D array. -* `reshape_v0(env, shape)` reshapes observations into given shape. +.. py:function:: frame_skip_v0(env, num_frames) -* `resize_v1(env, x_size, y_size, linear_interp=False)` Performs interpolation to up-size or down-size observation image using area interpolation by default. Linear interpolation is also available by setting `linear_interp=True` (it's faster and better for up-sizing). This wrapper is only available for 2D or 3D observations, and only makes sense if the observation is an image. + Skips `num_frames` number of frames by reapplying old actions over and over. Observations skipped over are ignored. Rewards skipped over are accumulated. Like Gymnasium Atari's frameskip parameter, `num_frames` can also be a tuple `(min_skip, max_skip)`, which indicates a range of possible skip lengths which are randomly chosen from (in single agent environments only). -* `nan_noop_v0(env)` If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a no operation action in its place. The noop action is accepted as an argument in the `step(action, no_op_action)` function. +.. py:function:: delay_observations_v0(env, delay) -* `nan_zeros_v0(env)` If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a zeros action in its place. + Delays observation by `delay` frames. Before `delay` frames have been executed, the observation is all zeros. Along with frame_skip, this is the preferred way to implement reaction time for high FPS games. -* `nan_random_v0(env)` If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a random action in its place. The random action will be retrieved from the action mask. +.. py:function:: sticky_actions_v0(env, repeat_action_probability) -* `scale_actions_v0(env, scale)` Scales the high and low bounds of the action space by the `scale` argument in __init__(). Additionally, scales any actions by the same value when step() is called. + Assigns a probability of an old action "sticking" to the environment and not updating as requested. This is to prevent agents from learning predefined action patterns in highly deterministic games like Atari. Note that the stickiness is cumulative, so an action has a repeat_action_probability^2 chance of an action sticking for two turns in a row, etc. This is the recommended way of adding randomness to Atari by *"Machado et al. (2018), "Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents"* +.. py:function:: frame_stack_v1(env, num_frames=4) + Stacks the most recent frames. For vector games observed via plain vectors (1D arrays), the output is just concatenated to a longer 1D array. 2D or 3D arrays are stacked to be taller 3D arrays. At the start of the game, frames that don't yet exist are filled with 0s. `num_frames=1` is analogous to not using this function. + +.. py:function:: max_observation_v0(env, memory) + + The resulting observation becomes the max over `memory` number of prior frames. This is important for Atari environments, as many games have elements that are intermitently flashed on the instead of being constant, due to the peculiarities of the console and CRT TVs. The OpenAI baselines MaxAndSkip Atari wrapper is equivalent to doing `memory=2` and then a `frame_skip` of 4. + +.. py:function:: normalize_obs_v0(env, env_min=0, env_max=1) + + Linearly scales observations to the range `env_min` (default 0) to `env_max` (default 1), given the known minimum and maximum observation values defined in the observation space. Only works on Box observations with float32 or float64 dtypes and finite bounds. If you wish to normalize another type, you can first apply the dtype wrapper to convert your type to float32 or float64. + +.. py:function:: reshape_v0(env, shape) + + Reshapes observations into given shape. + +.. py:function:: resize_v1(env, x_size, y_size, linear_interp=False) + + Performs interpolation to up-size or down-size observation image using area interpolation by default. Linear interpolation is also available by setting `linear_interp=True` (it's faster and better for up-sizing). This wrapper is only available for 2D or 3D observations, and only makes sense if the observation is an image. + +.. py:function:: nan_noop_v0(env) + + If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a no operation action in its place. The noop action is accepted as an argument in the `step(action, no_op_action)` function. + +.. py:function:: nan_zeros_v0(env) + + If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a zeros action in its place. + +.. py:function:: nan_random_v0(env) + + If an action is a NaN value for a step, the following wrapper will trigger a warning and perform a random action in its place. The random action will be retrieved from the action mask. + +.. py:function:: scale_actions_v0(env, scale) + + Scales the high and low bounds of the action space by the `scale` argument in __init__(). Additionally, scales any actions by the same value when step() is called. + +``` ## Included Multi-Agent Only Functions -* `agent_indicator_v0(env, type_only=False)` Adds an indicator of the agent ID to the observation, only supports discrete and 1D, 2D, and 3D box. For 1d spaces, the agent ID is converted to a 1-hot vector and appended to the observation (increasing the size of the observation space as necessary). 2d and 3d spaces are treated as images (with channels last) and the ID is converted to *n* additional channels with the channel that represents the ID as all 1s and the other channel as all 0s (a sort of one hot encoding). This allows MADRL methods like parameter sharing to learn policies for heterogeneous agents since the policy can tell what agent it's acting on. Set the `type_only` parameter to parse the name of the agent as `_` and have the appended 1-hot vector only identify the type, rather than the specific agent name. This would, for example give all agents on the red team in the [MAgent battle environment](https://pettingzoo.farama.org/environments/magent/battle) the same agent indicator. This is useful for games where there are many agents in an environment but few types of agents. Agent indication for MADRL was first introduced in *Cooperative Multi-Agent Control Using Deep Reinforcement Learning.* +```{eval-rst} +.. py:function:: agent_indicator_v0(env, type_only=False) -* `black_death_v2(env)` Instead of removing dead actions, observations and rewards are 0 and actions are ignored. This can simplify handling agent death mechanics. The name "black death" does not come from the plague, but from the fact that you see a black image (an image filled with zeros) when you die. + Adds an indicator of the agent ID to the observation, only supports discrete and 1D, 2D, and 3D box. For 1d spaces, the agent ID is converted to a 1-hot vector and appended to the observation (increasing the size of the observation space as necessary). 2d and 3d spaces are treated as images (with channels last) and the ID is converted to *n* additional channels with the channel that represents the ID as all 1s and the other channel as all 0s (a sort of one hot encoding). This allows MADRL methods like parameter sharing to learn policies for heterogeneous agents since the policy can tell what agent it's acting on. Set the `type_only` parameter to parse the name of the agent as `_` and have the appended 1-hot vector only identify the type, rather than the specific agent name. This would, for example give all agents on the red team in the [MAgent battle environment](https://pettingzoo.farama.org/environments/magent/battle) the same agent indicator. This is useful for games where there are many agents in an environment but few types of agents. Agent indication for MADRL was first introduced in *Cooperative Multi-Agent Control Using Deep Reinforcement Learning.* -* `pad_action_space_v0(env)` pads the action spaces of all agents to be be the same as the biggest, per the algorithm posed in *Parameter Sharing is Surprisingly Useful for Deep Reinforcement Learning*. This enables MARL methods that require homogeneous action spaces for all agents to work with environments with heterogeneous action spaces. Discrete actions inside the padded region will be set to zero, and Box actions will be cropped down to the original space. +.. py:function:: black_death_v2(env) -* `pad_observations_v0(env)` pads observations to be of the shape of the largest observation of any agent with 0s, per the algorithm posed in *Parameter Sharing is Surprisingly Useful for Deep Reinforcement Learning*. This enables MARL methods that require homogeneous observations from all agents to work in environments with heterogeneous observations. This currently supports Discrete and Box observation spaces. + Instead of removing dead actions, observations and rewards are 0 and actions are ignored. This can simplify handling agent death mechanics. The name "black death" does not come from the plague, but from the fact that you see a black image (an image filled with zeros) when you die. + +.. py:function:: pad_action_space_v0(env) + + Pads the action spaces of all agents to be be the same as the biggest, per the algorithm posed in *Parameter Sharing is Surprisingly Useful for Deep Reinforcement Learning*. This enables MARL methods that require homogeneous action spaces for all agents to work with environments with heterogeneous action spaces. Discrete actions inside the padded region will be set to zero, and Box actions will be cropped down to the original space. + +.. py:function:: pad_observations_v0(env) + + Pads observations to be of the shape of the largest observation of any agent with 0s, per the algorithm posed in *Parameter Sharing is Surprisingly Useful for Deep Reinforcement Learning*. This enables MARL methods that require homogeneous observations from all agents to work in environments with heterogeneous observations. This currently supports Discrete and Box observation spaces. +``` [//]: # (## Environment Vectorization) diff --git a/docs/content/basic_usage.md b/docs/content/basic_usage.md index d0e00ddef..945e6a547 100644 --- a/docs/content/basic_usage.md +++ b/docs/content/basic_usage.md @@ -5,7 +5,7 @@ title: API ## Initializing Environments -Using environments in PettingZoo is very similar to using them in OpenAI's Gym. You initialize an environment via: +Using environments in PettingZoo is very similar to using them in Gymnasium. You initialize an environment via: ``` python from pettingzoo.butterfly import pistonball_v6 diff --git a/docs/index.md b/docs/index.md index f7f4210b2..201b334f8 100644 --- a/docs/index.md +++ b/docs/index.md @@ -65,7 +65,7 @@ Donate :name: warlods ``` -**Environments can be interacted with in a manner very similar to Gym:** +**Environments can be interacted with in a manner very similar to Gymnasium:** ```python from pettingzoo.butterfly import knights_archers_zombies_v10 diff --git a/pettingzoo/utils/wrappers/assert_out_of_bounds.py b/pettingzoo/utils/wrappers/assert_out_of_bounds.py index d11c36e5f..125a03533 100644 --- a/pettingzoo/utils/wrappers/assert_out_of_bounds.py +++ b/pettingzoo/utils/wrappers/assert_out_of_bounds.py @@ -4,10 +4,7 @@ class AssertOutOfBoundsWrapper(BaseWrapper): - """This wrapper crashes for out of bounds actions. - - Should be used for Discrete spaces - """ + """Asserts if the action given to step is outside of the action space. Applied in PettingZoo environments with discrete action spaces.""" def __init__(self, env): super().__init__(env) diff --git a/pettingzoo/utils/wrappers/base.py b/pettingzoo/utils/wrappers/base.py index 52a281ab6..fc06da718 100644 --- a/pettingzoo/utils/wrappers/base.py +++ b/pettingzoo/utils/wrappers/base.py @@ -6,7 +6,7 @@ class BaseWrapper(AECEnv): """Creates a wrapper around `env` parameter. - Extend this class to create a useful wrapper. + All AECEnv wrappers should inherit from this base class """ def __init__(self, env): diff --git a/pettingzoo/utils/wrappers/capture_stdout.py b/pettingzoo/utils/wrappers/capture_stdout.py index 95a601500..47d843be4 100644 --- a/pettingzoo/utils/wrappers/capture_stdout.py +++ b/pettingzoo/utils/wrappers/capture_stdout.py @@ -3,6 +3,8 @@ class CaptureStdoutWrapper(BaseWrapper): + """Takes an environment which prints to terminal, and gives it an `ansi` render mode where it captures the terminal output and returns it as a string instead.""" + def __init__(self, env): assert ( env.render_mode == "human" diff --git a/pettingzoo/utils/wrappers/clip_out_of_bounds.py b/pettingzoo/utils/wrappers/clip_out_of_bounds.py index f9bacf506..2fb953117 100644 --- a/pettingzoo/utils/wrappers/clip_out_of_bounds.py +++ b/pettingzoo/utils/wrappers/clip_out_of_bounds.py @@ -6,7 +6,10 @@ class ClipOutOfBoundsWrapper(BaseWrapper): - """This wrapper crops out of bounds actions for Box spaces.""" + """Clips the input action to fit in the continuous action space (emitting a warning if it does so). + + Applied to continuous environments in pettingzoo. + """ def __init__(self, env): super().__init__(env) diff --git a/pettingzoo/utils/wrappers/order_enforcing.py b/pettingzoo/utils/wrappers/order_enforcing.py index 6ef519b91..23a9bf99e 100644 --- a/pettingzoo/utils/wrappers/order_enforcing.py +++ b/pettingzoo/utils/wrappers/order_enforcing.py @@ -4,7 +4,7 @@ class OrderEnforcingWrapper(BaseWrapper): - """Check all call orders. + """Checks if function calls or attribute access are in a disallowed order. * error on getting rewards, terminations, truncations, infos, agent_selection before reset * error on calling step, observe before reset diff --git a/pettingzoo/utils/wrappers/terminate_illegal.py b/pettingzoo/utils/wrappers/terminate_illegal.py index 4479b7d56..289de8918 100644 --- a/pettingzoo/utils/wrappers/terminate_illegal.py +++ b/pettingzoo/utils/wrappers/terminate_illegal.py @@ -5,8 +5,8 @@ class TerminateIllegalWrapper(BaseWrapper): """This wrapper terminates the game with the current player losing in case of illegal values. - Parameters: - - illegal_reward: number that is the value of the player making an illegal move. + Args: + illegal_reward: number that is the value of the player making an illegal move. """ def __init__(self, env, illegal_reward):