feat(multi agent): update multi-agent environments for the interactio…

…n of multiple agents (#70)
PKU-Alignment · Aug 21, 2023 · f499491 · f499491
1 parent ae3574d
commit f499491
Show file tree

Hide file tree

Showing 99 changed files with 61,611 additions and 28 deletions.
diff --git a/docs/_static/images/doggo_back.jpeg b/docs/_static/images/doggo_back.jpeg
diff --git a/docs/_static/images/doggo_front.jpeg b/docs/_static/images/doggo_front.jpeg
diff --git a/docs/_static/images/doggo_left.jpeg b/docs/_static/images/doggo_left.jpeg
diff --git a/docs/_static/images/doggo_right.jpeg b/docs/_static/images/doggo_right.jpeg
diff --git a/docs/environments/safe_vision/building_button.rst b/docs/environments/safe_vision/building_button.rst
@@ -55,7 +55,7 @@ Level0
     :align: center
     :scale: 26 %
 
-The agent is tasked to proficiently operate several machines within a construction site setting.
+**The Level 0 of BuildingButton** requires the agent to proficiently operate multiple machines within a construction site.
 
 +-----------------------------+-------------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (32,), float64)                                    |
@@ -105,7 +105,7 @@ Level1
     :align: center
     :scale: 26 %
 
-The agent is required to adeptly and accurately operate multiple machines within a construction site, while concurrently evading other robots and obstacles present in the area.
+**The Level 1 of BuildingButton** requires the agent to proficiently and accurately operate multiple machines within a construction site, while concurrently evading other robots and obstacles present in the area.
 
 +-----------------------------+--------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (64,), float64)                               |
@@ -174,7 +174,7 @@ Level2
     :align: center
     :scale: 26 %
 
-The agent is tasked to proficiently and accurately operate several machines within a construction site, while simultaneously navigating around a heightened number of other robots and obstacles in the area.
+**The Level 2 of BuildingButton** requires the agent to proficiently and accurately operate multiple machines within a construction site, while concurrently evading a heightened number of other robots and obstacles in the area.
 
 +-----------------------------+------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (64,), float64)                             |

diff --git a/docs/environments/safe_vision/building_goal.rst b/docs/environments/safe_vision/building_goal.rst
@@ -50,7 +50,7 @@ Level0
     :align: center
     :scale: 26 %
 
-The agent is tasked to accurately dock at designated positions within a construction site setting.
+**The Level 0 of BuildingGoal** requires the agent to dock at designated positions within a construction site.
 
 +-----------------------------+------------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (16,), float64)                                   |
@@ -98,7 +98,7 @@ Level1
     :align: center
     :scale: 26 %
 
-The agent is required to accurately dock at specific locations within a construction site, while ensuring to avoid entry into hazardous areas.
+**The Level 1 of BuildingGoal** requires the agent to dock at designated positions within a construction site while ensuring to avoid entry into hazardous areas.
 
 +-----------------------------+----------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (48,), float64)                                 |
@@ -166,7 +166,7 @@ Level2
     :align: center
     :scale: 26 %
 
-The agent is tasked to precisely dock at designated locations within a construction site, circumvent the site's exhaust fans, and ensure it does not enter any hazardous zones.
+**The Level 2 of BuildingGoal** requires the agent to dock at designated positions within a construction site, while ensuring to avoid entry into hazardous areas and circumventing the site’s exhaust fans.
 
 +-----------------------------+-----------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (48,), float64)                            |

diff --git a/docs/environments/safe_vision/building_push.rst b/docs/environments/safe_vision/building_push.rst
@@ -68,7 +68,7 @@ Level0
     :align: center
     :scale: 26 %
 
-The agent is tasked to relocate boxes to designated locations within a construction site setting.
+**The Level 0 of BuildingPush** requires the agent to relocate the box to designated locations within a construction site.
 
 +-----------------------------+-----------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (32,), float64)                            |
@@ -118,7 +118,7 @@ Level1
     :align: center
     :scale: 26 %
 
-The agent is tasked to transport boxes to designated spots within a construction site, while avoiding areas demarcated as restricted.
+**The Level 1 of BuildingPush** requires the agent to relocate the box to designated locations within a construction site while avoiding areas demarcated as restricted.
 
 +-----------------------------+----------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (64,), float64)                           |
@@ -183,7 +183,7 @@ Level2
     :align: center
     :scale: 26 %
 
-The agent is assigned to shift boxes to specific positions within a construction site, while meticulously avoiding numerous hazardous fuel drums and zones marked as off-limits.
+**The Level 2 of BuildingPush** requires the agent to relocate the box to designated locations within a construction while avoiding numerous hazardous fuel drums and areas demarcated as restricted.
 
 +-----------------------------+------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (64,), float64)                             |

diff --git a/docs/environments/safe_vision/fading_easy.rst b/docs/environments/safe_vision/fading_easy.rst
@@ -60,7 +60,7 @@ Level0
     :align: center
     :scale: 100 %
 
-The agent endeavors to reach the 'Goal' location, even as it grapples with the challenge of dissipating information.
+**The Level 0 of FadingEasy** requires the agent to reach the goal position. The **goal** will linearly disappear in **150** steps after every refresh.
 
 Fading Objects
 ^^^^^^^^^^^^^^
@@ -92,7 +92,7 @@ Level1
     :align: center
     :scale: 100 %
 
-The agent strives to maximize its approaches to the 'Goal' location in the presence of vanishing information, while diligently avoiding 'Hazards'. Although 'Vases' hold a value of 1, they do not contribute to the cost computation.
+**The Level 1 of FadingEasy** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas. The **goal** will linearly disappear in **150** steps after every refresh.
 
 
 Fading Objects
@@ -140,7 +140,7 @@ Level2
     :align: center
     :scale: 100 %
 
-The agent aims to frequently reach the 'Goal' location despite the challenges posed by fading information, ensuring it steers clear of 'Hazards' and avoids collisions with 'Vases'.
+**The Level 2 of FadingEasy** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas and avoids collisions with vases. The **goal** and **hazardous areas** will linearly disappear in **150** steps after every refresh.
 
 Fading Objects
 ^^^^^^^^^^^^^^

diff --git a/docs/environments/safe_vision/fading_hard.rst b/docs/environments/safe_vision/fading_hard.rst
@@ -61,7 +61,7 @@ Level0
     :align: center
     :scale: 100 %
 
-Confronted by the swift disappearance of information, the agent seeks to maximize its reaches to the 'Goal' location.
+**The Level 0 of FadingHard** requires the agent to reach the goal position. The **goal** will linearly disappear in **75** steps after every refresh.
 
 
 Fading Objects
@@ -95,7 +95,7 @@ Level1
     :align: center
     :scale: 100 %
 
-Confronted with the rapid disappearance of information, the agent endeavors to frequently attain the 'Goal' location, while vigilantly avoiding 'Hazards'. Notably, although 'Vases' hold a value of 1, they do not contribute to the cost computation.
+**The Level 1 of FadingHard** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas. The **goal** will linearly disappear in **75** steps after every refresh.
 
 
 Fading Objects
@@ -142,7 +142,7 @@ Level2
     :align: center
     :scale: 100 %
 
-Confronted with the challenge of dissipating information, the agent endeavors to optimize its approaches to the 'Goal' location, all the while sidestepping the 'Hazards' zone and preventing collisions with 'Vases'.
+**The Level 2 of FadingHard** requires the agent to reach the goal position, ensuring it steers clear of hazardous areas and avoids collisions with vases. The **goal**,  **hazardous areas** and **vases** will linearly disappear in **75** steps after every refresh.
 
 Fading Objects
 ^^^^^^^^^^^^^^

diff --git a/docs/environments/safe_vision/formula_one.rst b/docs/environments/safe_vision/formula_one.rst
@@ -47,7 +47,7 @@ Level0
     :align: center
     :scale: 40 %
 
-For each episode, the agent is randomly initialized at one of the seven checkpoints and endeavors to maximize its reaches to the 'Goal' location.
+**The Level 0 of FormulaOne** requires the agent to maximize its reach to the goal position. For each episode, the agent is randomly initialized at one of the seven checkpoints.
 
 +-----------------------------+------------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (16,), float64)                                   |
@@ -105,7 +105,7 @@ Level1
     :align: center
     :scale: 40 %
 
-On each episode, the agent is randomly positioned at one of seven checkpoints and seeks to optimize its approaches to the 'Goal' location, all while circumventing 'RoadBarriers' and racetrack fences.
+**The Level 1 of FormulaOne** requires the agent to maximize its reach to the goal position while circumventing barriers and racetrack fences. For each episode, the agent is randomly initialized at one of the seven checkpoints.
 
 +-----------------------------+----------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (32,), float64)                                 |
@@ -171,7 +171,7 @@ Level2
     :align: center
     :scale: 40 %
 
-During each episode, the agent is randomly stationed at one of seven checkpoints. It strives to maximize its approaches to the 'Goal' location, while vigilantly avoiding collisions with 'RoadBarriers' and racetrack fences. Notably, the 'RoadBarriers' surrounding the checkpoints are denser.
+**The Level 2 of FormulaOne** requires the agent to maximize its reach to the goal position while circumventing barriers and racetrack fences. For each episode, the agent is randomly initialized at one of the seven checkpoints. Notably, the barriers surrounding the checkpoints are denser.
 
 +-----------------------------+-----------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (32,), float64)                            |

diff --git a/docs/environments/safe_vision/race.rst b/docs/environments/safe_vision/race.rst
@@ -48,7 +48,7 @@ Level0
     :align: center
     :scale: 45 %
 
-The agent's objective is to reach the 'Goal'.
+**The Level 0 of Race** requires the agent to reach the goal position.
 
 +-----------------------------+------------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (16,), float64)                                   |
@@ -96,7 +96,7 @@ Level1
     :align: center
     :scale: 45 %
 
-The agent aims to reach the 'Goal' while ensuring it avoids straying into the grass and prevents collisions with roadside objects.
+**The Level 1 of Race** requires the agent to reach the goal position while ensuring it avoids straying into the grass and prevents collisions with roadside objects.
 
 +-----------------------------+----------------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (32,), float64)                                 |
@@ -160,7 +160,7 @@ Level2
     :align: center
     :scale: 45 %
 
-From a distant starting point, the agent is tasked with reaching the 'Goal', ensuring it sidesteps the grass and refrains from colliding with objects along the path.
+**The Level 2 of Race** requires the agent to reach the goal position from a distant starting point while ensuring it avoids straying into the grass and prevents collisions with roadside objects.
 
 +-----------------------------+-----------------------------------------------------------+
 | Specific Observation Space  | Box(-inf, inf, (32,), float64)                            |

diff --git a/examples/multi_goal.py b/examples/multi_goal.py
@@ -0,0 +1,53 @@
+# Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ==============================================================================
+"""Examples for multi goal environments."""
+
+import argparse
+
+import safety_gymnasium
+
+
+def run_random(env_name):
+    """Random run."""
+    env = safety_gymnasium.make(env_name, render_mode='human')
+    obs, _ = env.reset()
+    # Use below to specify seed.
+    # obs, _ = env.reset(seed=0)
+    terminated, truncated = {'agent_0': False}, {'agent_0': False}
+    ep_ret, ep_cost = 0, 0
+    while True:
+        if terminated['agent_0'] or truncated['agent_0']:
+            print(f'Episode Return: {ep_ret} \t Episode Cost: {ep_cost}')
+            ep_ret, ep_cost = 0, 0
+            obs, _ = env.reset()
+
+        act = {}
+        for agent in env.agents:
+            assert env.observation_space(agent).contains(obs[agent])
+            act[agent] = env.action_space(agent).sample()
+            assert env.action_space(agent).contains(act[agent])
+
+        obs, reward, cost, terminated, truncated, _ = env.step(act)
+
+        for agent in env.agents:
+            ep_ret += reward[agent]
+            ep_cost += cost[agent]
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--env', default='SafetyAntMultiGoal2-v0')
+    args = parser.parse_args()
+    run_random(args.env)
diff --git a/images/doggo_front.jpeg b/images/doggo_front.jpeg
diff --git a/safety_gymnasium/__init__.py b/safety_gymnasium/__init__.py
@@ -20,7 +20,7 @@
 from gymnasium import register as gymnasium_register
 
 from safety_gymnasium import vector, wrappers
-from safety_gymnasium.tasks.safe_multi_agent.safe_mujoco_multi import make_ma
+from safety_gymnasium.tasks.safe_multi_agent.tasks.velocity.safe_mujoco_multi import make_ma
 from safety_gymnasium.utils.registration import make, register
 from safety_gymnasium.version import __version__
 
@@ -290,3 +290,60 @@ def __combine(tasks, agents, max_episode_steps):
     entry_point='safety_gymnasium.tasks.safe_velocity.safety_humanoid_velocity_v1:SafetyHumanoidVelocityEnv',
     max_episode_steps=1000,
 )
+
+
+def __combine_multi(tasks, agents, max_episode_steps):
+    """Combine tasks and agents together to register environment tasks."""
+    for task_name, task_config in tasks.items():
+        # Vector inputs
+        for robot_name in agents:
+            env_id = f'{PREFIX}{robot_name}{task_name}-{VERSION}'
+            combined_config = copy.deepcopy(task_config)
+            combined_config.update({'agent_name': robot_name})
+
+            __register_helper(
+                env_id=env_id,
+                entry_point='safety_gymnasium.tasks.safe_multi_agent.builder:Builder',
+                spec_kwargs={'config': combined_config, 'task_id': env_id},
+                max_episode_steps=max_episode_steps,
+                disable_env_checker=True,
+            )
+
+            if MAKE_VISION_ENVIRONMENTS:
+                # Vision inputs
+                vision_env_name = f'{PREFIX}{robot_name}{task_name}Vision-{VERSION}'
+                vision_config = {
+                    'observe_vision': True,
+                    'observation_flatten': False,
+                }
+                vision_config.update(combined_config)
+                __register_helper(
+                    env_id=vision_env_name,
+                    entry_point='safety_gymnasium.tasks.safe_multi_agent.builder:Builder',
+                    spec_kwargs={'config': vision_config, 'task_id': env_id},
+                    max_episode_steps=max_episode_steps,
+                    disable_env_checker=True,
+                )
+
+            if MAKE_DEBUG_ENVIRONMENTS and robot_name in ['Point', 'Car', 'Racecar']:
+                # Keyboard inputs for debugging
+                debug_env_name = f'{PREFIX}{robot_name}{task_name}Debug-{VERSION}'
+                debug_config = {'debug': True}
+                debug_config.update(combined_config)
+                __register_helper(
+                    env_id=debug_env_name,
+                    entry_point='safety_gymnasium.tasks.safe_multi_agent.builder:Builder',
+                    spec_kwargs={'config': debug_config, 'task_id': env_id},
+                    max_episode_steps=max_episode_steps,
+                    disable_env_checker=True,
+                )
+
+
+# ----------------------------------------
+# Safety Multi-Agent
+# ----------------------------------------
+
+# Multi Goal Environments
+# ----------------------------------------
+fading_tasks = {'MultiGoal0': {}, 'MultiGoal1': {}, 'MultiGoal2': {}}
+__combine_multi(fading_tasks, robots, max_episode_steps=1000)
diff --git a/safety_gymnasium/assets/xmls/ant.xml b/safety_gymnasium/assets/xmls/ant.xml
@@ -41,7 +41,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
           <geom fromto="0.0 0.0 0.0 0.05 0.05 0.0" name="left_leg_geom" size="0.02" type="capsule" rgba="0.0039 0.1529 0.3961 1"/>
           <body pos="0.05 0.05 0" name="front_left_foot">
             <joint axis="-1 1 0" name="ankle_1" pos="0.0 0.0 0.0" range="0.52 1.74" type="hinge"/>
-            <geom fromto="0.0 0.0 0.0 0.1 0.1 0.0" name="left_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1"/>
+            <geom fromto="0.0 0.0 0.0 0.1 0.1 0.0" name="left_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1" density="50000.0"/>
           </body>
         </body>
       </body>
@@ -52,7 +52,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
           <geom fromto="0.0 0.0 0.0 -0.05 0.05 0.0" name="right_leg_geom" size="0.02" type="capsule" rgba="0.0039 0.1529 0.3961 1"/>
           <body pos="-0.05 0.05 0" name="front_right_foot">
             <joint axis="1 1 0" name="ankle_2" pos="0.0 0.0 0.0" range="-1.74 -0.52" type="hinge"/>
-            <geom fromto="0.0 0.0 0.0 -0.1 0.1 0.0" name="right_ankle_geom" size="0.02" type="capsule" rgba="0.8 0.6 0.4 1"/>
+            <geom fromto="0.0 0.0 0.0 -0.1 0.1 0.0" name="right_ankle_geom" size="0.02" type="capsule" density="50000.0"/>
           </body>
         </body>
       </body>
@@ -63,7 +63,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
           <geom fromto="0.0 0.0 0.0 -0.05 -0.05 0.0" name="back_leg_geom" size="0.02" type="capsule" rgba="0.7412 0.0431 0.1843 1"/>
           <body pos="-0.05 -0.05 0" name="left_back_foot">
             <joint axis="-1 1 0" name="ankle_3" pos="0.0 0.0 0.0" range="-1.74 -0.52" type="hinge"/>
-            <geom fromto="0.0 0.0 0.0 -0.1 -0.1 0.0" name="third_ankle_geom" size="0.02" type="capsule" rgba="0.8 0.6 0.4 1"/>
+            <geom fromto="0.0 0.0 0.0 -0.1 -0.1 0.0" name="third_ankle_geom" size="0.02" type="capsule" density="50000.0"/>
           </body>
         </body>
       </body>
@@ -74,7 +74,7 @@ Copyright 2022-2023 OmniSafe Team. All Rights Reserved.
           <geom fromto="0.0 0.0 0.0 0.05 -0.05 0.0" name="rightback_leg_geom" size="0.02" type="capsule" rgba="0.7412 0.0431 0.1843 1"/>
           <body pos="0.05 -0.05 0" name="right_back_foot">
             <joint axis="1 1 0" name="ankle_4" pos="0.0 0.0 0.0" range="0.52 1.74" type="hinge"/>
-            <geom fromto="0.0 0.0 0.0 0.1 -0.1 0.0" name="fourth_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1"/>
+            <geom fromto="0.0 0.0 0.0 0.1 -0.1 0.0" name="fourth_ankle_geom" size="0.02" type="capsule" rgba=".8 .5 .3 1" density="50000.0"/>
           </body>
         </body>
       </body>

diff --git a/safety_gymnasium/builder.py b/safety_gymnasium/builder.py
@@ -246,7 +246,11 @@ def step(self, action: np.ndarray) -> tuple[np.ndarray, float, float, bool, bool
 
         if self.render_parameters.mode == 'human':
             self.render()
-        return self.task.obs(), reward, cost, self.terminated, self.truncated, info
+
+        terminateds = {'agent_0': self.terminated, 'agent_1': self.terminated}
+        truncateds = {'agent_0': self.truncated, 'agent_1': self.truncated}
+
+        return self.task.obs(), reward, cost, terminateds, truncateds, info
 
     def _reward(self) -> float:
         """Calculate the current rewards.