Skip to content

Commit

Permalink
Merge pull request #98 from Replicable-MARL/sy_dev
Browse files Browse the repository at this point in the history
Sy dev
  • Loading branch information
Theohhhu authored Apr 25, 2023
2 parents 392cf4b + d6be00b commit 12f0ce7
Show file tree
Hide file tree
Showing 66 changed files with 585 additions and 105 deletions.
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
<div align="center">
<img src=docs/source/images/logo1.png width=65% />
</div>
[comment]: <> (<div align="center">)

[comment]: <> (<img src=docs/source/images/logo1.png width=65% />)

[comment]: <> (</div>)

<h1 align="center"> MARLlib: A Scalable and Efficient Multi-agent Reinforcement Learning Library </h1>

Expand All @@ -10,10 +12,9 @@
[![GitHub issues](https://img.shields.io/github/issues/Replicable-MARL/MARLlib)](https://github.com/Replicable-MARL/MARLlib/issues)
[![PyPI version](https://badge.fury.io/py/marllib.svg)](https://badge.fury.io/py/marllib)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Replicable-MARL/MARLlib/blob/sy_dev/marllib.ipynb)
[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
[![Organization](https://img.shields.io/badge/Organization-ReLER_RL-blue.svg)](https://github.com/Replicable-MARL/MARLlib)
[![Organization](https://img.shields.io/badge/Organization-PKU_MARL-blue.svg)](https://github.com/Replicable-MARL/MARLlib)

[![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)

> __News__:
> We are excited to announce that a major update has just been released. For detailed version information, please refer to the [version info](https://github.com/Replicable-MARL/MARLlib/releases/tag/1.0.2).
Expand Down Expand Up @@ -55,7 +56,7 @@ Here we provide a table for the comparison of MARLlib and existing work.
| [MAPPO Benchmark](https://github.com/marlbenchmark/on-policy) | 4 cooperative | 1 | share + separate | MLP + GRU | :x: |
| [MAlib](https://github.com/sjtu-marl/malib) | 4 self-play | 10 | share + group + separate | MLP + LSTM | [![Documentation Status](https://readthedocs.org/projects/malib/badge/?version=latest)](https://malib.readthedocs.io/en/latest/?badge=latest)
| [EPyMARL](https://github.com/uoe-agents/epymarl)| 4 cooperative | 9 | share + separate | GRU | :x: |
| **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** | 11 **no task mode restriction** | 18 | share + group + separate + **customizable** | MLP + CNN + GRU + LSTM | [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) |
| **[MARLlib](https://github.com/Replicable-MARL/MARLlib)** | 12 **no task mode restriction** | 18 | share + group + separate + **customizable** | MLP + CNN + GRU + LSTM | [![Documentation Status](https://readthedocs.org/projects/marllib/badge/?version=latest)](https://marllib.readthedocs.io/en/latest/) |

| Library | Github Stars | Documentation | Issues Open | Activity | Last Update
|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|
Expand Down Expand Up @@ -108,7 +109,7 @@ First, install MARLlib dependencies to guarantee basic usage.
following [this guide](https://marllib.readthedocs.io/en/latest/handbook/env.html), finally install patches for RLlib.

```bash
$ conda create -n marllib python=3.8
$ conda create -n marllib python=3.8 # or 3.9
$ conda activate marllib
$ git clone https://github.com/Replicable-MARL/MARLlib.git && cd MARLlib
$ pip install -r requirements.txt
Expand Down Expand Up @@ -185,6 +186,7 @@ Most of the popular environments in MARL research are supported by MARLlib:
| **[GRF](https://github.com/google-research/football)** | collaborative + mixed | Full | Discrete | 2D |
| **[Hanabi](https://github.com/deepmind/hanabi-learning-environment)** | cooperative | Partial | Discrete | 1D |
| **[MATE](https://github.com/XuehaiPan/mate)** | cooperative + mixed | Partial | Both | 1D |
| **[GoBigger](https://github.com/opendilab/GoBigger)** | cooperative + mixed | Both | Continuous | 1D |

Each environment has a readme file, standing as the instruction for this task, including env settings, installation, and
important notes.
Expand Down Expand Up @@ -320,7 +322,11 @@ More tutorial documentations are available [here](https://marllib.readthedocs.io

## Awesome List

A collection of research and review papers of multi-agent reinforcement learning (MARL) is available [here](https://marllib.readthedocs.io/en/latest/resources/awesome.html). The papers have been organized based on their publication date and their evaluation of the corresponding environments.
A collection of research and review papers of multi-agent reinforcement learning (MARL) is available. The papers have been organized based on their publication date and their evaluation of the corresponding environments.

Algorithms: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/resources/awesome.html)
Environments: [![Awesome](https://awesome.re/badge.svg)](https://marllib.readthedocs.io/en/latest/handbook/env.html)


## Community

Expand Down
5 changes: 3 additions & 2 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ This list describes the planned features including breaking changes.
- [ ] manual training, refer to issue: https://github.com/Replicable-MARL/MARLlib/issues/86#issuecomment-1468188682
- [ ] new environments
- [x] MATE: https://github.com/UnrealTracking/mate
- [ ] Go-Bigger: https://github.com/opendilab/GoBigger
- [x] Go-Bigger: https://github.com/opendilab/GoBigger
- [ ] Voltage Control: https://github.com/Future-Power-Networks/MAPDN
- [ ] Overcooked: https://github.com/HumanCompatibleAI/overcooked_ai
- [ ] Support Transformer architecture
- [ ] CloseAirCombat: https://github.com/liuqh16/CloseAirCombat
- [ ] Support Transformers
50 changes: 49 additions & 1 deletion docs/source/handbook/env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -594,4 +594,52 @@ Installation

.. code-block:: shell
pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate
pip3 install git+https://github.com/XuehaiPan/mate.git#egg=mate
.. _GoBigger:

GoBigger
==============
.. only:: html

.. figure:: images/env_gobigger.gif
:width: 320
:align: center


GoBigger is a game engine that offers an efficient and easy-to-use platform for agar-like game development. It provides a variety of interfaces specifically designed for game AI development. The game mechanics of GoBigger are similar to those of Agar, a popular massive multiplayer online action game developed by Matheus Valadares of Brazil. The objective of GoBigger is for players to navigate one or more circular balls across a map, consuming Food Balls and smaller balls to increase their size while avoiding larger balls that can consume them. Each player starts with a single ball, but can divide it into two when it reaches a certain size, giving them control over multiple balls.
Official Link: https://github.com/opendilab/GoBigger

.. list-table::
:widths: 25 25
:header-rows: 0

* - ``Original Learning Mode``
- Cooperative + Mixed
* - ``MARLlib Learning Mode``
- Cooperative + Mixed
* - ``Observability``
- Partial + Full
* - ``Action Space``
- Continuous
* - ``Observation Space Dim``
- 1D
* - ``Action Mask``
- No
* - ``Global State``
- No
* - ``Global State Space Dim``
- /
* - ``Reward``
- Dense
* - ``Agent-Env Interact Mode``
- Simultaneous


Installation
-----------------

.. code-block:: shell
conda install -c opendilab gobigger
Binary file added docs/source/images/env_gobigger.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 6 additions & 0 deletions marllib/envs/base_env/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,9 @@
except Exception as e:
ENV_REGISTRY["mate"] = str(e)

try:
from marllib.envs.base_env.gobigger import RLlibGoBigger
ENV_REGISTRY["gobigger"] = RLlibGoBigger
except Exception as e:
ENV_REGISTRY["gobigger"] = str(e)

33 changes: 33 additions & 0 deletions marllib/envs/base_env/config/gobigger.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# MIT License

# Copyright (c) 2023 Replicable-MARL

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

env: gobigger

env_args:
map_name: "st_t1p2" # st(andard)_t(eam)2p(layer)2
#num_teams: 1
#num_agents: 2
frame_limit: 1600
mask_flag: False
global_state_flag: False
opp_action_in_cc: True
fixed_batch_timesteps: 3200 # optional, all scenario will use this batch size, only valid for on-policy algorithms
202 changes: 202 additions & 0 deletions marllib/envs/base_env/gobigger.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# MIT License

# Copyright (c) 2023 Replicable-MARL

# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

import copy

from gobigger.envs import create_env_custom
from gym.spaces import Dict as GymDict, Box
from ray.rllib.env.multi_agent_env import MultiAgentEnv
import numpy as np


policy_mapping_dict = {
"all_scenario": {
"description": "mixed scenarios to t>2 (num_teams > 1)",
"team_prefix": ("team0_", "team1_"),
"all_agents_one_policy": True,
"one_agent_one_policy": True,
},
}


class RLlibGoBigger(MultiAgentEnv):

def __init__(self, env_config):

map_name = env_config["map_name"]

env_config.pop("map_name", None)
self.num_agents_per_team = int(map_name.split("p")[-1][0])
self.num_teams = int(map_name.split("_t")[1][0])
if self.num_teams == 1:
policy_mapping_dict["all_scenario"]["team_prefix"] = ("team0_",)
self.num_agents = self.num_agents_per_team * self.num_teams
self.max_steps = env_config["frame_limit"]
self.env = create_env_custom(type='st', cfg=dict(
team_num=self.num_teams,
player_num_per_team=self.num_agents_per_team,
frame_limit=self.max_steps
))

self.action_space = Box(low=-1,
high=1,
shape=(2,),
dtype=float)

self.rectangle_dim = 4
self.food_dim = self.num_agents * 100
self.thorns_dim = self.num_agents * 6
self.clone_dim = self.num_agents * 10
self.team_name_dim = 1
self.score_dim = 1

self.obs_dim = self.rectangle_dim + self.food_dim + self.thorns_dim + \
self.clone_dim + self.team_name_dim + self.score_dim

self.observation_space = GymDict({"obs": Box(
low=-1e6,
high=1e6,
shape=(self.obs_dim,),
dtype=float)})

self.agents = []
for team_index in range(self.num_teams):
for agent_index in range(self.num_agents_per_team):
self.agents.append("team{}_{}".format(team_index, agent_index))

env_config["map_name"] = map_name
self.env_config = env_config

def reset(self):
original_obs = self.env.reset()
obs = {}
for agent_index, agent_name in enumerate(self.agents):

rectangle = list(original_obs[1][agent_index]["rectangle"])

overlap_dict = original_obs[1][agent_index]["overlap"]

food = overlap_dict["food"]
if 4 * len(food) > self.food_dim:
food = food[:self.food_dim // 4]
else:
padding = [0] * (self.food_dim - 4 * len(food))
food.append(padding)
food = [item for sublist in food for item in sublist]

thorns = overlap_dict["thorns"]
if 6 * len(thorns) > self.thorns_dim:
thorns = thorns[:self.thorns_dim // 6]
else:
padding = [0] * (self.thorns_dim - 6 * len(thorns))
thorns.append(padding)
thorns = [item for sublist in thorns for item in sublist]

clone = overlap_dict["clone"]
if 10 * len(clone) > self.clone_dim:
clone = clone[:self.clone_dim // 10]
else:
padding = [0] * (self.clone_dim - 10 * len(clone))
clone.append(padding)
clone = [item for sublist in clone for item in sublist]

team = original_obs[1][agent_index]["team_name"]
score = original_obs[1][agent_index]["score"]

all_elements = rectangle + food + thorns + clone + [team] + [score]
all_elements = np.array(all_elements, dtype=float)

obs[agent_name] = {
"obs": all_elements
}

return obs

def step(self, action_dict):
actions = {}
for i, agent_name in enumerate(self.agents):
actions[i] = list(action_dict[agent_name])
actions[i].append(-1)

original_obs, team_rewards, done, info = self.env.step(actions)

rewards = {}
obs = {}
infos = {}

for agent_index, agent_name in enumerate(self.agents):

rectangle = list(original_obs[1][agent_index]["rectangle"])

overlap_dict = original_obs[1][agent_index]["overlap"]

food = overlap_dict["food"]
if 4 * len(food) > self.food_dim:
food = food[:self.food_dim // 4]
else:
padding = [0] * (self.food_dim - 4 * len(food))
food.append(padding)
food = [item for sublist in food for item in sublist]

thorns = overlap_dict["thorns"]
if 6 * len(thorns) > self.thorns_dim:
thorns = thorns[:self.thorns_dim // 6]
else:
padding = [0] * (self.thorns_dim - 6 * len(thorns))
thorns.append(padding)
thorns = [item for sublist in thorns for item in sublist]

clone = overlap_dict["clone"]
if 10 * len(clone) > self.clone_dim:
clone = clone[:self.clone_dim // 10]
else:
padding = [0] * (self.clone_dim - 10 * len(clone))
clone.append(padding)
clone = [item for sublist in clone for item in sublist]

team = original_obs[1][agent_index]["team_name"]
score = original_obs[1][agent_index]["score"]

all_elements = rectangle + food + thorns + clone + [team] + [score]
all_elements = np.array(all_elements, dtype=float)

obs[agent_name] = {
"obs": all_elements
}

rewards[agent_name] = team_rewards[team]

dones = {"__all__": done}
return obs, rewards, dones, infos

def get_env_info(self):
env_info = {
"space_obs": self.observation_space,
"space_act": self.action_space,
"num_agents": self.num_agents,
"episode_limit": self.max_steps,
"policy_mapping_info": policy_mapping_dict
}
return env_info

def close(self):
self.env.close()
Loading

0 comments on commit 12f0ce7

Please sign in to comment.