Skip to content

Commit

Permalink
feat(multi agent): update multi-agent environments for the interactio…
Browse files Browse the repository at this point in the history
…n of multiple agents (#70)
  • Loading branch information
muchvo authored Aug 21, 2023
1 parent ae3574d commit f499491
Show file tree
Hide file tree
Showing 99 changed files with 61,611 additions and 28 deletions.
Binary file modified docs/_static/images/doggo_back.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/images/doggo_front.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/images/doggo_left.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/_static/images/doggo_right.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 3 additions & 3 deletions docs/environments/safe_vision/building_button.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Level0
:align: center
:scale: 26 %

The agent is tasked to proficiently operate several machines within a construction site setting.
**The Level 0 of BuildingButton** requires the agent to proficiently operate multiple machines within a construction site.

+-----------------------------+-------------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (32,), float64) |
Expand Down Expand Up @@ -105,7 +105,7 @@ Level1
:align: center
:scale: 26 %

The agent is required to adeptly and accurately operate multiple machines within a construction site, while concurrently evading other robots and obstacles present in the area.
**The Level 1 of BuildingButton** requires the agent to proficiently and accurately operate multiple machines within a construction site, while concurrently evading other robots and obstacles present in the area.

+-----------------------------+--------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (64,), float64) |
Expand Down Expand Up @@ -174,7 +174,7 @@ Level2
:align: center
:scale: 26 %

The agent is tasked to proficiently and accurately operate several machines within a construction site, while simultaneously navigating around a heightened number of other robots and obstacles in the area.
**The Level 2 of BuildingButton** requires the agent to proficiently and accurately operate multiple machines within a construction site, while concurrently evading a heightened number of other robots and obstacles in the area.

+-----------------------------+------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (64,), float64) |
Expand Down
6 changes: 3 additions & 3 deletions docs/environments/safe_vision/building_goal.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ Level0
:align: center
:scale: 26 %

The agent is tasked to accurately dock at designated positions within a construction site setting.
**The Level 0 of BuildingGoal** requires the agent to dock at designated positions within a construction site.

+-----------------------------+------------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (16,), float64) |
Expand Down Expand Up @@ -98,7 +98,7 @@ Level1
:align: center
:scale: 26 %

The agent is required to accurately dock at specific locations within a construction site, while ensuring to avoid entry into hazardous areas.
**The Level 1 of BuildingGoal** requires the agent to dock at designated positions within a construction site while ensuring to avoid entry into hazardous areas.

+-----------------------------+----------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (48,), float64) |
Expand Down Expand Up @@ -166,7 +166,7 @@ Level2
:align: center
:scale: 26 %

The agent is tasked to precisely dock at designated locations within a construction site, circumvent the site's exhaust fans, and ensure it does not enter any hazardous zones.
**The Level 2 of BuildingGoal** requires the agent to dock at designated positions within a construction site, while ensuring to avoid entry into hazardous areas and circumventing the site’s exhaust fans.

+-----------------------------+-----------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (48,), float64) |
Expand Down
6 changes: 3 additions & 3 deletions docs/environments/safe_vision/building_push.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Level0
:align: center
:scale: 26 %

The agent is tasked to relocate boxes to designated locations within a construction site setting.
**The Level 0 of BuildingPush** requires the agent to relocate the box to designated locations within a construction site.

+-----------------------------+-----------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (32,), float64) |
Expand Down Expand Up @@ -118,7 +118,7 @@ Level1
:align: center
:scale: 26 %

The agent is tasked to transport boxes to designated spots within a construction site, while avoiding areas demarcated as restricted.
**The Level 1 of BuildingPush** requires the agent to relocate the box to designated locations within a construction site while avoiding areas demarcated as restricted.

+-----------------------------+----------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (64,), float64) |
Expand Down Expand Up @@ -183,7 +183,7 @@ Level2
:align: center
:scale: 26 %

The agent is assigned to shift boxes to specific positions within a construction site, while meticulously avoiding numerous hazardous fuel drums and zones marked as off-limits.
**The Level 2 of BuildingPush** requires the agent to relocate the box to designated locations within a construction while avoiding numerous hazardous fuel drums and areas demarcated as restricted.

+-----------------------------+------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (64,), float64) |
Expand Down
6 changes: 3 additions & 3 deletions docs/environments/safe_vision/fading_easy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ Level0
:align: center
:scale: 100 %

The agent endeavors to reach the 'Goal' location, even as it grapples with the challenge of dissipating information.
**The Level 0 of FadingEasy** requires the agent to reach the goal position. The **goal** will linearly disappear in **150** steps after every refresh.

Fading Objects
^^^^^^^^^^^^^^
Expand Down Expand Up @@ -92,7 +92,7 @@ Level1
:align: center
:scale: 100 %

The agent strives to maximize its approaches to the 'Goal' location in the presence of vanishing information, while diligently avoiding 'Hazards'. Although 'Vases' hold a value of 1, they do not contribute to the cost computation.
**The Level 1 of FadingEasy** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas. The **goal** will linearly disappear in **150** steps after every refresh.


Fading Objects
Expand Down Expand Up @@ -140,7 +140,7 @@ Level2
:align: center
:scale: 100 %

The agent aims to frequently reach the 'Goal' location despite the challenges posed by fading information, ensuring it steers clear of 'Hazards' and avoids collisions with 'Vases'.
**The Level 2 of FadingEasy** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas and avoids collisions with vases. The **goal** and **hazardous areas** will linearly disappear in **150** steps after every refresh.

Fading Objects
^^^^^^^^^^^^^^
Expand Down
6 changes: 3 additions & 3 deletions docs/environments/safe_vision/fading_hard.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Level0
:align: center
:scale: 100 %

Confronted by the swift disappearance of information, the agent seeks to maximize its reaches to the 'Goal' location.
**The Level 0 of FadingHard** requires the agent to reach the goal position. The **goal** will linearly disappear in **75** steps after every refresh.


Fading Objects
Expand Down Expand Up @@ -95,7 +95,7 @@ Level1
:align: center
:scale: 100 %

Confronted with the rapid disappearance of information, the agent endeavors to frequently attain the 'Goal' location, while vigilantly avoiding 'Hazards'. Notably, although 'Vases' hold a value of 1, they do not contribute to the cost computation.
**The Level 1 of FadingHard** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas. The **goal** will linearly disappear in **75** steps after every refresh.


Fading Objects
Expand Down Expand Up @@ -142,7 +142,7 @@ Level2
:align: center
:scale: 100 %

Confronted with the challenge of dissipating information, the agent endeavors to optimize its approaches to the 'Goal' location, all the while sidestepping the 'Hazards' zone and preventing collisions with 'Vases'.
**The Level 2 of FadingHard** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas and avoids collisions with vases. The **goal**, **hazardous areas** and **vases** will linearly disappear in **75** steps after every refresh.

Fading Objects
^^^^^^^^^^^^^^
Expand Down
6 changes: 3 additions & 3 deletions docs/environments/safe_vision/formula_one.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Level0
:align: center
:scale: 40 %

For each episode, the agent is randomly initialized at one of the seven checkpoints and endeavors to maximize its reaches to the 'Goal' location.
**The Level 0 of FormulaOne** requires the agent to maximize its reach to the goal position. For each episode, the agent is randomly initialized at one of the seven checkpoints.

+-----------------------------+------------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (16,), float64) |
Expand Down Expand Up @@ -105,7 +105,7 @@ Level1
:align: center
:scale: 40 %

On each episode, the agent is randomly positioned at one of seven checkpoints and seeks to optimize its approaches to the 'Goal' location, all while circumventing 'RoadBarriers' and racetrack fences.
**The Level 1 of FormulaOne** requires the agent to maximize its reach to the goal position while circumventing barriers and racetrack fences. For each episode, the agent is randomly initialized at one of the seven checkpoints.

+-----------------------------+----------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (32,), float64) |
Expand Down Expand Up @@ -171,7 +171,7 @@ Level2
:align: center
:scale: 40 %

During each episode, the agent is randomly stationed at one of seven checkpoints. It strives to maximize its approaches to the 'Goal' location, while vigilantly avoiding collisions with 'RoadBarriers' and racetrack fences. Notably, the 'RoadBarriers' surrounding the checkpoints are denser.
**The Level 2 of FormulaOne** requires the agent to maximize its reach to the goal position while circumventing barriers and racetrack fences. For each episode, the agent is randomly initialized at one of the seven checkpoints. Notably, the barriers surrounding the checkpoints are denser.

+-----------------------------+-----------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (32,), float64) |
Expand Down
6 changes: 3 additions & 3 deletions docs/environments/safe_vision/race.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Level0
:align: center
:scale: 45 %

The agent's objective is to reach the 'Goal'.
**The Level 0 of Race** requires the agent to reach the goal position.

+-----------------------------+------------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (16,), float64) |
Expand Down Expand Up @@ -96,7 +96,7 @@ Level1
:align: center
:scale: 45 %

The agent aims to reach the 'Goal' while ensuring it avoids straying into the grass and prevents collisions with roadside objects.
**The Level 1 of Race** requires the agent to reach the goal position while ensuring it avoids straying into the grass and prevents collisions with roadside objects.

+-----------------------------+----------------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (32,), float64) |
Expand Down Expand Up @@ -160,7 +160,7 @@ Level2
:align: center
:scale: 45 %

From a distant starting point, the agent is tasked with reaching the 'Goal', ensuring it sidesteps the grass and refrains from colliding with objects along the path.
**The Level 2 of Race** requires the agent to reach the goal position from a distant starting point while ensuring it avoids straying into the grass and prevents collisions with roadside objects.

+-----------------------------+-----------------------------------------------------------+
| Specific Observation Space | Box(-inf, inf, (32,), float64) |
Expand Down
53 changes: 53 additions & 0 deletions examples/multi_goal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Examples for multi goal environments."""

import argparse

import safety_gymnasium


def run_random(env_name):
"""Random run."""
env = safety_gymnasium.make(env_name, render_mode='human')
obs, _ = env.reset()
# Use below to specify seed.
# obs, _ = env.reset(seed=0)
terminated, truncated = {'agent_0': False}, {'agent_0': False}
ep_ret, ep_cost = 0, 0
while True:
if terminated['agent_0'] or truncated['agent_0']:
print(f'Episode Return: {ep_ret} \t Episode Cost: {ep_cost}')
ep_ret, ep_cost = 0, 0
obs, _ = env.reset()

act = {}
for agent in env.agents:
assert env.observation_space(agent).contains(obs[agent])
act[agent] = env.action_space(agent).sample()
assert env.action_space(agent).contains(act[agent])

obs, reward, cost, terminated, truncated, _ = env.step(act)

for agent in env.agents:
ep_ret += reward[agent]
ep_cost += cost[agent]


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--env', default='SafetyAntMultiGoal2-v0')
args = parser.parse_args()
run_random(args.env)
Binary file modified images/doggo_front.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 58 additions & 1 deletion safety_gymnasium/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
from gymnasium import register as gymnasium_register

from safety_gymnasium import vector, wrappers
from safety_gymnasium.tasks.safe_multi_agent.safe_mujoco_multi import make_ma
from safety_gymnasium.tasks.safe_multi_agent.tasks.velocity.safe_mujoco_multi import make_ma
from safety_gymnasium.utils.registration import make, register
from safety_gymnasium.version import __version__

Expand Down Expand Up @@ -290,3 +290,60 @@ def __combine(tasks, agents, max_episode_steps):
entry_point='safety_gymnasium.tasks.safe_velocity.safety_humanoid_velocity_v1:SafetyHumanoidVelocityEnv',
max_episode_steps=1000,
)


def __combine_multi(tasks, agents, max_episode_steps):
"""Combine tasks and agents together to register environment tasks."""
for task_name, task_config in tasks.items():
# Vector inputs
for robot_name in agents:
env_id = f'{PREFIX}{robot_name}{task_name}-{VERSION}'
combined_config = copy.deepcopy(task_config)
combined_config.update({'agent_name': robot_name})

__register_helper(
env_id=env_id,
entry_point='safety_gymnasium.tasks.safe_multi_agent.builder:Builder',
spec_kwargs={'config': combined_config, 'task_id': env_id},
max_episode_steps=max_episode_steps,
disable_env_checker=True,
)

if MAKE_VISION_ENVIRONMENTS:
# Vision inputs
vision_env_name = f'{PREFIX}{robot_name}{task_name}Vision-{VERSION}'
vision_config = {
'observe_vision': True,
'observation_flatten': False,
}
vision_config.update(combined_config)
__register_helper(
env_id=vision_env_name,
entry_point='safety_gymnasium.tasks.safe_multi_agent.builder:Builder',
spec_kwargs={'config': vision_config, 'task_id': env_id},
max_episode_steps=max_episode_steps,
disable_env_checker=True,
)

if MAKE_DEBUG_ENVIRONMENTS and robot_name in ['Point', 'Car', 'Racecar']:
# Keyboard inputs for debugging
debug_env_name = f'{PREFIX}{robot_name}{task_name}Debug-{VERSION}'
debug_config = {'debug': True}
debug_config.update(combined_config)
__register_helper(
env_id=debug_env_name,
entry_point='safety_gymnasium.tasks.safe_multi_agent.builder:Builder',
spec_kwargs={'config': debug_config, 'task_id': env_id},
max_episode_steps=max_episode_steps,
disable_env_checker=True,
)


# ----------------------------------------
# Safety Multi-Agent
# ----------------------------------------

# Multi Goal Environments
# ----------------------------------------
fading_tasks = {'MultiGoal0': {}, 'MultiGoal1': {}, 'MultiGoal2': {}}
__combine_multi(fading_tasks, robots, max_episode_steps=1000)
8 changes: 4 additions & 4 deletions safety_gymnasium/assets/xmls/ant.xml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
<geom fromto="0.0 0.0 0.0 0.05 0.05 0.0" name="left_leg_geom" size="0.02" type="capsule" rgba="0.0039 0.1529 0.3961 1"/>
<body pos="0.05 0.05 0" name="front_left_foot">
<joint axis="-1 1 0" name="ankle_1" pos="0.0 0.0 0.0" range="0.52 1.74" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.1 0.1 0.0" name="left_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1"/>
<geom fromto="0.0 0.0 0.0 0.1 0.1 0.0" name="left_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1" density="50000.0"/>
</body>
</body>
</body>
Expand All @@ -52,7 +52,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
<geom fromto="0.0 0.0 0.0 -0.05 0.05 0.0" name="right_leg_geom" size="0.02" type="capsule" rgba="0.0039 0.1529 0.3961 1"/>
<body pos="-0.05 0.05 0" name="front_right_foot">
<joint axis="1 1 0" name="ankle_2" pos="0.0 0.0 0.0" range="-1.74 -0.52" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.1 0.1 0.0" name="right_ankle_geom" size="0.02" type="capsule" rgba="0.8 0.6 0.4 1"/>
<geom fromto="0.0 0.0 0.0 -0.1 0.1 0.0" name="right_ankle_geom" size="0.02" type="capsule" density="50000.0"/>
</body>
</body>
</body>
Expand All @@ -63,7 +63,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
<geom fromto="0.0 0.0 0.0 -0.05 -0.05 0.0" name="back_leg_geom" size="0.02" type="capsule" rgba="0.7412 0.0431 0.1843 1"/>
<body pos="-0.05 -0.05 0" name="left_back_foot">
<joint axis="-1 1 0" name="ankle_3" pos="0.0 0.0 0.0" range="-1.74 -0.52" type="hinge"/>
<geom fromto="0.0 0.0 0.0 -0.1 -0.1 0.0" name="third_ankle_geom" size="0.02" type="capsule" rgba="0.8 0.6 0.4 1"/>
<geom fromto="0.0 0.0 0.0 -0.1 -0.1 0.0" name="third_ankle_geom" size="0.02" type="capsule" density="50000.0"/>
</body>
</body>
</body>
Expand All @@ -74,7 +74,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
<geom fromto="0.0 0.0 0.0 0.05 -0.05 0.0" name="rightback_leg_geom" size="0.02" type="capsule" rgba="0.7412 0.0431 0.1843 1"/>
<body pos="0.05 -0.05 0" name="right_back_foot">
<joint axis="1 1 0" name="ankle_4" pos="0.0 0.0 0.0" range="0.52 1.74" type="hinge"/>
<geom fromto="0.0 0.0 0.0 0.1 -0.1 0.0" name="fourth_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1"/>
<geom fromto="0.0 0.0 0.0 0.1 -0.1 0.0" name="fourth_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1" density="50000.0"/>
</body>
</body>
</body>
Expand Down
6 changes: 5 additions & 1 deletion safety_gymnasium/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,11 @@ def step(self, action: np.ndarray) -> tuple[np.ndarray, float, float, bool, bool

if self.render_parameters.mode == 'human':
self.render()
return self.task.obs(), reward, cost, self.terminated, self.truncated, info

terminateds = {'agent_0': self.terminated, 'agent_1': self.terminated}
truncateds = {'agent_0': self.truncated, 'agent_1': self.truncated}

return self.task.obs(), reward, cost, terminateds, truncateds, info

def _reward(self) -> float:
"""Calculate the current rewards.
Expand Down
Loading

0 comments on commit f499491

Please sign in to comment.