`mujoco-v5` initial commit #104

Kallinteris-Andreas · 2023-01-24T23:39:30Z

cont from: #91

Description

Adds the v5 version of the mujoco environments.

Changelog

mujoco minimum version is 2.3.3 now.
added for support for fully-custom/third-party mujoco models with the usage of the xml_file argument (previously only a few changes could be made).
added default_camera_config argument, a dictionary for setting the mj_camera properties, primarily useful for custom environments.
added env.observation_structure, a dictionary indicating the composition of the observation space (e.g. qpos, qvel), useful for building tooling and wrappers for the MuJoCo environments.
return info with reset(), previously an empty dictionary would be returned.
added frame_skip argument.

Ant

fixed bug: healthy_reward being given on every step (even when the Ant is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
reward function now always includes contact_cost, previously it was only included when use_contact_forces=True (can be set to 0 with contact_cost_weight=0).
excluded worldbody's cfrc_ext from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
added main_body argument.
added forward_reward_weight argument.
added include_cfrc_ext_in_observation argument.
removed use_contact_forces argument (note: its functionality has been replaced with include_cfrc_ext_in_observation and contact_cost_weight).
fix info "reward_ctrl" sometimes containing contact_cost instead of ctrl_cost.
fixed info "x_position" & "y_position" giving xpos instead of qpos observations (xpos observations are behind 1 mj_step()).
removed "forward_reward" from info (note: there still exits "reward_forward", which contains the same information).

Half Cheetah

re-added xml_file argument.
renamed info "reward_run" → "reward_forward" (to be consistent with the other environments).

Hopper

changed model (new model does not require coordinate='global', but has near identical behavior).
fixed bug: healthy_reward being given on every step (even when the Hopper is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
re-added xml_file argument.
added info "reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".

Humanoid

fixed bug: healthy_reward being given on every step (even when the Humanoid is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
re-added contact_cost (and the corresponding contact_cost_weight and contact_cost_range arguments).
excluded worldbody cinert & cvel & cfrc_ext & root/freejoint `qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
re-add xml_file argument.
added include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments.
fixed info "x_position" & "y_position" giving xpos instead of qpos observations (xpos observations are behind 1 mj_step()).
added info "tendon_lenght" & "tendon_velocity".
renamed info "reward_alive" → "reward_survive" (to be consistent with the other environments).
renamed info "reward_linvel" → "reward_forward" (to be consistent with the other environments).
renamed info "reward_quadctrl" → "reward_ctrl" (to be consistent with the other environments).
removed "forward_reward" from info (note: there still exits "reward_forward").

Humanoid Standup

excluded worldbody cinert & cvel & cfrc_ext & root/freejoint `qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
added xml_file, uph_cost_weight, ctrl_cost_weight, impact_cost_weight, impact_cost_range, reset_noise_scale, exclude_current_positions_from_observation, include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments.
added info "tendon_lenght" & "tendon_velocity".
added info"x_position" & "y_position" & "z_distance_from_origin".

InvertedDoublePendulum

fixed bug: healthy_reward being given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
removed qfrc_constraint ("constraint force") of the hinges from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
added xml_file, healthy_reward, reset_noise_scale arguments.
added info "reward_survive", "distance_penalty", "velocity_penalty".

InvertedPendulum

fixed bug: healthy_reward being given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
added xml_file, reset_noise_scale arguments.
added info "reward_survive".

Pusher

added xml_file argument.
added reward_near_weight, reward_dist_weight, reward_control_weight arguments.
fixed info "reward_ctrl" being not being multiplied by the reward weight.
added info "reward_near".

Reacher

remove "z - position_fingertip" from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
added xml_file argument.
added reward_dist_weight, reward_control_weight arguments.
fixed info "reward_ctrl" being not being multiplied by the reward weight.

Swimmer

re-added xml_file argument.
added forward_reward_weight, ctrl_cost_weight, reset_noise_scale, exclude_current_positions_from_observation arguments.
replaced info "reward_fwd"/ "forward_reward" → "reward_forward" (to be consistent with the other environments).

Walker2D

changed model (new model does not require coordinate='global'), now both feet have friction==1.9, previously the right foot had friction==0.9 and left foot had friction==1.9.
fixed bug: healthy_reward being given on every step (even when the Walker2D is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
re-added xml_file argument.
added info "reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".

Type of change

add new revision of MuJoCo environments.

Checklist:

I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Benchmarks

benchmark similarly to Use mujoco bindings instead of mujoco_py openai/gym#2595 (comment) (v3 → v4)
https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation

issues fixed:

TODO

Verify that docs are being built
Update "Version History"
Blog anouncement ???
Minary Dataset generation ???

Finished environments

Cutting room floor (not included in the `v5` release)

add option for observing tendons in Humanoids
update kinematics of Ant & Humanoid after step
add ManySegmentSwimmer & CoupledHalfCheetah environments
adding reset_noise_scale to Pusher & Reacher

Credits

Lead Developer: Kallinteris Andreas
Debugging assistance & setting specification/requirements: Rodrigo, Mark Towers
Technical Advisor: saran-t (helped with the creation of the new Hopper and Walker2D models)

pseudo-rnd-thoughts · 2023-06-15T20:59:17Z

@Kallinteris-Andreas Thanks for all of this hard work, we are planning on having a gymnasium v0.28.2 and v0.29 in the next few weeks. If there are any changes you want to make in gymnasium, could you do them soon, thanks

Also, for what reason is the CI failing?

Kallinteris-Andreas · 2023-06-16T15:04:49Z

@pseudo-rnd-thoughts

I do not need anything in gymnasium.
The CI will be fixed once mujoco envs get moved to this repo
I am still evaluating the impact of [Bug Report] MuJoCo Envs, healthy reward issues Gymnasium#526, after that it should be ready for review.

pseudo-rnd-thoughts · 2023-06-23T10:16:48Z

@Kallinteris-Andreas I can't remember the previous conversations we had about this but I don't think we are planning on moving the mujoco environments (v2, v3 or v4) to gymnasium robotics.
While it makes sense, some of our plans have changed preventing this

Kallinteris-Andreas · 2023-06-23T13:51:56Z

Yeah, The brax simulator is not at the required feature parity to replace the mujoco envs in gymnasium.

After validation, I can move the PR to the gymnasium repo, it is no problem.

Kallinteris-Andreas · 2023-06-26T14:25:24Z

@pseudo-rnd-thoughts & @rodrigodelazcano
Thought development in the past 4 months a lot of changes has been accumulated.
Can you do a review of the change list (just the changelog, not the code), to make sure that all the changes are desired and properly explained?
(a quick look should be enough)

Note: the changelog here will be used in the Version History part of the documentation
Note: I will create a PR in gymnasium, in the next few days.

Thanks!

pseudo-rnd-thoughts · 2023-06-26T15:17:03Z

Could you add much more detail to each point, in particular, why the change was made? It would be great if you could look at the notes with minimal previous knowledge of the environment and understand the changes

For examples

fixed "reward_survive" being healthy_reward on every step (even on terminal steps).

reward function now always includes contact_cost.

excluded worldbody's contact forces (cfrc_ext) from the observation.

added main_body argument.

added forward_reward_weight argument.

added include_cfrc_ext_in_observation argument.

removed use_contact_forces argument (note: its functionality has been replaced with include_cfrc_ext_in_observation and contact_cost_weight).

fixed info "x_position" & "y_position" giving xpos instead of qpos observations.

removed "forward_reward" from info (note: there still exits "reward_forward").

The reward function includes the healthy_reward value on every step (even for terminated state). This was updated to only include the healthy_reward for non-terminal steps. The reward_survive in info is updated with this change.
The reward function now always includes contact_cost, previously this was only included when XXX.
The worldbody's contact force (observation cfrc_ext at index X) was always zero, therefore, removed as unnecessary information for the agent
Added main_body argument, which is used for X.
Added forward_reward_weight argument, which is used for X.
Add include_cfrc_ext_in_observation argument, which is used for X.
Removed use_contact_forces argument as its functionality has been replaced by include_cfrc_ext_in_observation and contact_cost_weight arguments). To reproduce v4 parameters (use_contact_forces=XXX) use include_cfrc_ext_in_observation=XXX and contact_cost_weight=XXX
Fixed info x_position and y_position giving xpos rather than qpos observations, this differs by XXX.
Removed forward_reward from info as XXX (note: there still exists reward_forward which differs by XXX).

Kallinteris-Andreas · 2023-06-27T18:36:35Z

@pseudo-rnd-thoughts thanks, I have made a bunch of improvements.

Can you do a second pass of the change list, to make sure that all the changes are desired.

pseudo-rnd-thoughts · 2023-06-30T11:01:07Z

Can you do a second pass of the change list, to make sure that all the changes are desired.

Without more detail, I can't understand all of the changes, could you do a documentation update

Kallinteris-Andreas · 2023-06-30T11:09:40Z

All the changes are in the docstings of the environments

Kallinteris-Andreas and others added 19 commits January 10, 2023 01:39

(temp) make mamujoco requirement mandatory

32dc22d

Merge branch 'Farama-Foundation:main' into main

061aa3e

Merge branch 'Farama-Foundation:main' into main

14e492f

MaMuJoCo Doc update

510f1c4

add ant2x4 image

1a845a0

MaMuJoCo DOC Update (adding action space PICs)

7fe9362

more pics

e6e49f1

typo fix

2c10773

typo fixes

53736ba

fix shinx warning

59772fb

Merge branch 'main' into main

10704ba

Merge branch 'Farama-Foundation:main' into main

1bd9ab4

MaMuJoCo DOC update

22e1110

minor formating changes

e0d7e06

add kwargs

a631fbe

Merge branch 'Farama-Foundation:main' into main

9fd69db

Merge branch 'Farama-Foundation:main' into main

528d00f

add mujuco-v5 (init)

22d54bc

pre-commit

81d98e2

Kallinteris-Andreas marked this pull request as draft January 24, 2023 23:41

Kallinteris-Andreas and others added 10 commits March 18, 2023 11:41

Merge branch 'Farama-Foundation:main' into main

b65c031

Merge branch 'Farama-Foundation:main' into main

fe9a547

add hopper_v5

e9a1643

add walker2d

ba585c8

add Half Cheetah

1b4a0a3

typo fix

479c3e1

add pusher

bc2de82

add swimmer

2ec47a5

pre-commit

a3e8906

typo fix

a3454bf

Kallinteris-Andreas added 13 commits June 1, 2023 16:25

ant doc

3c56866

ant cleaned up xy pos aquasition

f67639e

added main_body

ebc96c0

fix healthy_reward

5263fe6

pre-commit

89efd65

fix ant velocity

ca360f9

dict

3e6fc80

update renderer

8921dcd

add walker2d info[z_distance_from_origon]

360b6dc

add reset_info

f96d888

refactored observation structures to a member variable

9f910fb

cleanup observation_structure

85e9eda

Final? documention update

aa1b9e2

cleanup

577f21a

Kallinteris-Andreas and others added 4 commits June 16, 2023 21:19

fix distance_from_origin info

c406418

pre-commit

faea55c

Merge branch 'Farama-Foundation:main' into main

7875975

Update maze_v4.py

358f55d

cleanup

f94a2c3

Kallinteris-Andreas mentioned this pull request Jun 27, 2023

Add MuJoCo v5 environments Farama-Foundation/Gymnasium#572

Merged

35 tasks

Kallinteris-Andreas closed this Jun 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`mujoco-v5` initial commit #104

`mujoco-v5` initial commit #104

Kallinteris-Andreas commented Jan 24, 2023 •

edited

Loading

pseudo-rnd-thoughts commented Jun 15, 2023 •

edited

Loading

Kallinteris-Andreas commented Jun 16, 2023

pseudo-rnd-thoughts commented Jun 23, 2023

Kallinteris-Andreas commented Jun 23, 2023

Kallinteris-Andreas commented Jun 26, 2023 •

edited

Loading

pseudo-rnd-thoughts commented Jun 26, 2023

Kallinteris-Andreas commented Jun 27, 2023

pseudo-rnd-thoughts commented Jun 30, 2023

Kallinteris-Andreas commented Jun 30, 2023

mujoco-v5 initial commit #104

mujoco-v5 initial commit #104

Conversation

Kallinteris-Andreas commented Jan 24, 2023 • edited Loading

Description

Changelog

Ant

Half Cheetah

Hopper

Humanoid

Humanoid Standup

InvertedDoublePendulum

InvertedPendulum

Pusher

Reacher

Swimmer

Walker2D

Type of change

Checklist:

Benchmarks

issues fixed:

TODO

Finished environments

Cutting room floor (not included in the v5 release)

Credits

pseudo-rnd-thoughts commented Jun 15, 2023 • edited Loading

Kallinteris-Andreas commented Jun 16, 2023

pseudo-rnd-thoughts commented Jun 23, 2023

Kallinteris-Andreas commented Jun 23, 2023

Kallinteris-Andreas commented Jun 26, 2023 • edited Loading

pseudo-rnd-thoughts commented Jun 26, 2023

Kallinteris-Andreas commented Jun 27, 2023

pseudo-rnd-thoughts commented Jun 30, 2023

Kallinteris-Andreas commented Jun 30, 2023

`mujoco-v5` initial commit #104

`mujoco-v5` initial commit #104

Kallinteris-Andreas commented Jan 24, 2023 •

edited

Loading

Cutting room floor (not included in the `v5` release)

pseudo-rnd-thoughts commented Jun 15, 2023 •

edited

Loading

Kallinteris-Andreas commented Jun 26, 2023 •

edited

Loading