Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs Update 2 #817

Merged
merged 8 commits into from
Oct 8, 2022
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,14 @@ Get started with PettingZoo by following [the PettingZoo tutorial](https://petti

PettingZoo model environments as [*Agent Environment Cycle* (AEC) games](https://arxiv.org/pdf/2009.14471.pdf), in order to be able to cleanly support all types of multi-agent RL environments under one API and to minimize the potential for certain classes of common bugs.

Using environments in PettingZoo is very similar to Gym, i.e. you initialize an environment via:
Using environments in PettingZoo is very similar to Gymnasium, i.e. you initialize an environment via:

```python
from pettingzoo.butterfly import pistonball_v6
env = pistonball_v6.env()
```

Environments can be interacted with in a manner very similar to Gym:
Environments can be interacted with in a manner very similar to Gymnasium:

```python
env.reset()
Expand Down
88 changes: 49 additions & 39 deletions docs/api/core.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,79 +5,89 @@

.. autoclass:: AECEnv
WillDudley marked this conversation as resolved.
Show resolved Hide resolved

.. py:attribute:: agents
```

## Attributes


```{eval-rst}

.. autoattribute:: AECEnv.agents

A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).
A list of the names of all current agents, typically integers. These may be changed as an environment progresses (i.e. agents can be added or removed).

:type: list[AgentID]
:type: List[AgentID]

.. py:attribute:: num_agents
.. autoattribute:: AECEnv.num_agents
WillDudley marked this conversation as resolved.
Show resolved Hide resolved

The length of the agents list.
The length of the agents list.

:type: int
:type: int
WillDudley marked this conversation as resolved.
Show resolved Hide resolved

.. py:attribute:: possible_agents
.. autoattribute:: AECEnv.possible_agents

A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.
A list of all possible_agents the environment could generate. Equivalent to the list of agents in the observation and action spaces. This cannot be changed through play or resetting.

:type: list[AgentID]
:type: List[AgentID]

.. py:attribute:: max_num_agents
.. autoattribute:: AECEnv.max_num_agents

The length of the possible_agents list.
The length of the possible_agents list.

:type: int
:type: int

.. py:attribute:: agent_selection
.. autoattribute:: AECEnv.agent_selection

An attribute of the environment corresponding to the currently selected agent that an action can be taken for.
An attribute of the environment corresponding to the currently selected agent that an action can be taken for.

:type: AgentID
:type: AgentID

.. py:attribute:: dones
.. autoattribute:: AECEnv.dones

A dict of the done state of every current agent at the time called, keyed by name. `last()` accesses this attribute. Note that agents can be added or removed from this dict. The returned dict looks like::
A dict of the done state of every current agent at the time called, keyed by name. `last()` accesses this attribute. Note that agents can be added or removed from this dict. The returned dict looks like::

dones = {0:[first agent done state], 1:[second agent done state] ... n-1:[nth agent done state]}
dones = {0:[first agent done state], 1:[second agent done state] ... n-1:[nth agent done state]}

:type: Dict[AgentID, bool]
:type: Dict[AgentID, bool]

.. py:attribute:: rewards
.. autoattribute:: AECEnv.rewards

A dict of the rewards of every current agent at the time called, keyed by name. Rewards the instantaneous reward generated after the last step. Note that agents can be added or removed from this attribute. `last()` does not directly access this attribute, rather the returned reward is stored in an internal variable. The rewards structure looks like::
A dict of the rewards of every current agent at the time called, keyed by name. Rewards the instantaneous reward generated after the last step. Note that agents can be added or removed from this attribute. `last()` does not directly access this attribute, rather the returned reward is stored in an internal variable. The rewards structure looks like::

{0:[first agent reward], 1:[second agent reward] ... n-1:[nth agent reward]}
{0:[first agent reward], 1:[second agent reward] ... n-1:[nth agent reward]}

:type: Dict[AgentID, float]
:type: Dict[AgentID, float]

.. py:attribute:: infos
.. autoattribute:: AECEnv.infos

A dict of info for each current agent, keyed by name. Each agent's info is also a dict. Note that agents can be added or removed from this attribute. `last()` accesses this attribute. The returned dict looks like::
A dict of info for each current agent, keyed by name. Each agent's info is also a dict. Note that agents can be added or removed from this attribute. `last()` accesses this attribute. The returned dict looks like::

infos = {0:[first agent info], 1:[second agent info] ... n-1:[nth agent info]}
infos = {0:[first agent info], 1:[second agent info] ... n-1:[nth agent info]}

:type: Dict[AgentID, Dict[str, Any]]
:type: Dict[AgentID, Dict[str, Any]]

.. py:attribute:: observation_spaces
.. autoattribute:: AECEnv.observation_spaces

A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.
A dict of the observation spaces of every agent, keyed by name. This cannot be changed through play or resetting.

:type: Dict[AgentID, gym.spaces.Space]
:type: Dict[AgentID, gymnasium.spaces.Space]

.. py:attribute:: action_spaces
.. autoattribute:: AECEnv.action_spaces

A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.
A dict of the action spaces of every agent, keyed by name. This cannot be changed through play or resetting.

:type: Dict[AgentID, gym.spaces.Space]
:type: Dict[AgentID, gymnasium.spaces.Space]
```

.. automethod:: step
.. automethod:: reset
.. automethod:: observe
.. automethod:: render
.. automethod:: seed
.. automethod:: close
## Methods

```{eval-rst}
.. automethod:: AECEnv.step
.. automethod:: AECEnv.reset
.. automethod:: AECEnv.observe
.. automethod:: AECEnv.render
.. automethod:: AECEnv.seed
.. automethod:: AECEnv.close

```

4 changes: 3 additions & 1 deletion docs/api/parallel.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

In addition to the main API, we have a secondary parallel API for environments where all agents have simultaneous actions and observations. An environment with parallel API support can be created via `<game>.parallel_env()`. This API is based around the paradigm of *Partially Observable Stochastic Games* (POSGs) and the details are similar to [RLLib's MultiAgent environment specification](https://docs.ray.io/en/latest/rllib-env.html#multi-agent-and-hierarchical), except we allow for different observation and action spaces between the agents.

### Example Usage
## Example Usage

Environments can be interacted with as follows:

Expand All @@ -15,6 +15,8 @@ for step in range(max_cycles):
observations, rewards, terminations, truncations, infos = parallel_env.step(actions)
```

## ParallelEnv

```{eval-rst}
.. currentmodule:: pettingzoo.utils.env

Expand Down
17 changes: 11 additions & 6 deletions docs/api/pz_wrappers.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,17 @@ env = from_parallel(env)

We wanted our pettingzoo environments to be both easy to use and easy to implement. To combine these, we have a set of simple wrappers which provide input validation and other convenient reusable logic.

* `BaseWrapper`: All AECEnv wrappers should inherit from this base class
* `TerminateIllegalWrapper`: Handles illegal move logic for classic games
* `CaptureStdoutWrapper`: Takes an environment which prints to terminal, and gives it an `ansi` render mode where it captures the terminal output and returns it as a string instead.
* `AssertOutOfBoundsWrapper`: Asserts if the action given to step is outside of the action space. Applied in PettingZoo environments with discrete action spaces.
* `ClipOutOfBoundsWrapper`: Clips the input action to fit in the continuous action space (emitting a warning if it does so). Applied to continuous environments in pettingzoo.
* `OrderEnforcingWrapper`: Gives a sensible error message if function calls or attribute access are in a disallowed order, for example if step() is called before reset(), or the .dones attribute is accessed before reset(), or if seed() is called and then step() is used before reset() is called again (reset must be called after seed()). Applied to all PettingZoo environments.
```{eval-rst}
.. currentmodule:: pettingzoo.utils.wrappers

.. autoclass:: BaseWrapper
.. autoclass:: TerminateIllegalWrapper
.. autoclass:: CaptureStdoutWrapper
.. autoclass:: AssertOutOfBoundsWrapper
.. autoclass:: ClipOutOfBoundsWrapper
.. autoclass:: OrderEnforcingWrapper

```

You can apply these wrappers to your environment in a similar manner to the below example:

Expand Down
Loading