Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mujoco-v5 initial commit #104

Closed

Conversation

Kallinteris-Andreas
Copy link
Collaborator

@Kallinteris-Andreas Kallinteris-Andreas commented Jan 24, 2023

cont from: #91

Description

Adds the v5 version of the mujoco environments.

Changelog

  • mujoco minimum version is 2.3.3 now.
  • added for support for fully-custom/third-party mujoco models with the usage of the xml_file argument (previously only a few changes could be made).
  • added default_camera_config argument, a dictionary for setting the mj_camera properties, primarily useful for custom environments.
  • added env.observation_structure, a dictionary indicating the composition of the observation space (e.g. qpos, qvel), useful for building tooling and wrappers for the MuJoCo environments.
  • return info with reset(), previously an empty dictionary would be returned.
  • added frame_skip argument.

Ant

  • fixed bug: healthy_reward being given on every step (even when the Ant is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
  • reward function now always includes contact_cost, previously it was only included when use_contact_forces=True (can be set to 0 with contact_cost_weight=0).
  • excluded worldbody's cfrc_ext from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
  • added main_body argument.
  • added forward_reward_weight argument.
  • added include_cfrc_ext_in_observation argument.
  • removed use_contact_forces argument (note: its functionality has been replaced with include_cfrc_ext_in_observation and contact_cost_weight).
  • fix info "reward_ctrl" sometimes containing contact_cost instead of ctrl_cost.
  • fixed info "x_position" & "y_position" giving xpos instead of qpos observations (xpos observations are behind 1 mj_step()).
  • removed "forward_reward" from info (note: there still exits "reward_forward", which contains the same information).

Half Cheetah

  • re-added xml_file argument.
  • renamed info "reward_run" → "reward_forward" (to be consistent with the other environments).

Hopper

  • changed model (new model does not require coordinate='global', but has near identical behavior).
  • fixed bug: healthy_reward being given on every step (even when the Hopper is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
  • re-added xml_file argument.
  • added info "reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".

Humanoid

  • fixed bug: healthy_reward being given on every step (even when the Humanoid is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
  • re-added contact_cost (and the corresponding contact_cost_weight and contact_cost_range arguments).
  • excluded worldbody cinert & cvel & cfrc_ext & root/freejoint `qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
  • re-add xml_file argument.
  • added include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments.
  • fixed info "x_position" & "y_position" giving xpos instead of qpos observations (xpos observations are behind 1 mj_step()).
  • added info "tendon_lenght" & "tendon_velocity".
  • renamed info "reward_alive" → "reward_survive" (to be consistent with the other environments).
  • renamed info "reward_linvel" → "reward_forward" (to be consistent with the other environments).
  • renamed info "reward_quadctrl" → "reward_ctrl" (to be consistent with the other environments).
  • removed "forward_reward" from info (note: there still exits "reward_forward").

Humanoid Standup

  • excluded worldbody cinert & cvel & cfrc_ext & root/freejoint `qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
  • added xml_file, uph_cost_weight, ctrl_cost_weight, impact_cost_weight, impact_cost_range, reset_noise_scale, exclude_current_positions_from_observation, include_cinert_in_observation, include_cvel_in_observation, include_qfrc_actuator_in_observation, include_cfrc_ext_in_observation arguments.
  • added info "tendon_lenght" & "tendon_velocity".
  • added info"x_position" & "y_position" & "z_distance_from_origin".

InvertedDoublePendulum

  • fixed bug: healthy_reward being given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
  • removed qfrc_constraint ("constraint force") of the hinges from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
  • added xml_file, healthy_reward, reset_noise_scale arguments.
  • added info "reward_survive", "distance_penalty", "velocity_penalty".

InvertedPendulum

  • fixed bug: healthy_reward being given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
  • added xml_file, reset_noise_scale arguments.
  • added info "reward_survive".

Pusher

  • added xml_file argument.
  • added reward_near_weight, reward_dist_weight, reward_control_weight arguments.
  • fixed info "reward_ctrl" being not being multiplied by the reward weight.
  • added info "reward_near".

Reacher

  • remove "z - position_fingertip" from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).
  • added xml_file argument.
  • added reward_dist_weight, reward_control_weight arguments.
  • fixed info "reward_ctrl" being not being multiplied by the reward weight.

Swimmer

  • re-added xml_file argument.
  • added forward_reward_weight, ctrl_cost_weight, reset_noise_scale, exclude_current_positions_from_observation arguments.
  • replaced info "reward_fwd"/ "forward_reward" → "reward_forward" (to be consistent with the other environments).

Walker2D

  • changed model (new model does not require coordinate='global'), now both feet have friction==1.9, previously the right foot had friction==0.9 and left foot had friction==1.9.
  • fixed bug: healthy_reward being given on every step (even when the Walker2D is unhealthy), now it is only given when the Ant is healthy. The info "reward_survive" is updated with this change.
  • re-added xml_file argument.
  • added info "reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".

Type of change

add new revision of MuJoCo environments.

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Benchmarks

issues fixed:

TODO

  • Verify that docs are being built
  • Update "Version History"
  • Blog anouncement ???
  • Minary Dataset generation ???

Finished environments

  • Ant
  • Half Cheetah
  • Hopper
  • Humanoid
  • Humanoid Standup
  • Inverted Double Pendulum
  • Inverted Pendulum
  • Reacher
  • Swimmer
  • Pusher
  • Walker2D

Cutting room floor (not included in the v5 release)

  • add option for observing tendons in Humanoids
  • update kinematics of Ant & Humanoid after step
  • add ManySegmentSwimmer & CoupledHalfCheetah environments
  • adding reset_noise_scale to Pusher & Reacher

Credits

Lead Developer: Kallinteris Andreas
Debugging assistance & setting specification/requirements: Rodrigo, Mark Towers
Technical Advisor: saran-t (helped with the creation of the new Hopper and Walker2D models)

@Kallinteris-Andreas Kallinteris-Andreas marked this pull request as draft January 24, 2023 23:41
@pseudo-rnd-thoughts
Copy link
Member

pseudo-rnd-thoughts commented Jun 15, 2023

@Kallinteris-Andreas Thanks for all of this hard work, we are planning on having a gymnasium v0.28.2 and v0.29 in the next few weeks. If there are any changes you want to make in gymnasium, could you do them soon, thanks

Also, for what reason is the CI failing?

@Kallinteris-Andreas
Copy link
Collaborator Author

@pseudo-rnd-thoughts

  1. I do not need anything in gymnasium.
  2. The CI will be fixed once mujoco envs get moved to this repo
  3. I am still evaluating the impact of [Bug Report] MuJoCo Envs, healthy reward issues Gymnasium#526, after that it should be ready for review.

@pseudo-rnd-thoughts
Copy link
Member

@Kallinteris-Andreas I can't remember the previous conversations we had about this but I don't think we are planning on moving the mujoco environments (v2, v3 or v4) to gymnasium robotics.
While it makes sense, some of our plans have changed preventing this

@Kallinteris-Andreas
Copy link
Collaborator Author

Yeah, The brax simulator is not at the required feature parity to replace the mujoco envs in gymnasium.

After validation, I can move the PR to the gymnasium repo, it is no problem.

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Jun 26, 2023

@pseudo-rnd-thoughts & @rodrigodelazcano
Thought development in the past 4 months a lot of changes has been accumulated.
Can you do a review of the change list (just the changelog, not the code), to make sure that all the changes are desired and properly explained?
(a quick look should be enough)

Note: the changelog here will be used in the Version History part of the documentation
Note: I will create a PR in gymnasium, in the next few days.

Thanks!

@pseudo-rnd-thoughts
Copy link
Member

Could you add much more detail to each point, in particular, why the change was made? It would be great if you could look at the notes with minimal previous knowledge of the environment and understand the changes

For examples

  • fixed "reward_survive" being healthy_reward on every step (even on terminal steps).
  • reward function now always includes contact_cost.
  • excluded worldbody's contact forces (cfrc_ext) from the observation.
  • added main_body argument.
  • added forward_reward_weight argument.
  • added include_cfrc_ext_in_observation argument.
  • removed use_contact_forces argument (note: its functionality has been replaced with include_cfrc_ext_in_observation and contact_cost_weight).
  • fixed info "x_position" & "y_position" giving xpos instead of qpos observations.
  • removed "forward_reward" from info (note: there still exits "reward_forward").
  • The reward function includes the healthy_reward value on every step (even for terminated state). This was updated to only include the healthy_reward for non-terminal steps. The reward_survive in info is updated with this change.
  • The reward function now always includes contact_cost, previously this was only included when XXX.
  • The worldbody's contact force (observation cfrc_ext at index X) was always zero, therefore, removed as unnecessary information for the agent
  • Added main_body argument, which is used for X.
  • Added forward_reward_weight argument, which is used for X.
  • Add include_cfrc_ext_in_observation argument, which is used for X.
  • Removed use_contact_forces argument as its functionality has been replaced by include_cfrc_ext_in_observation and contact_cost_weight arguments). To reproduce v4 parameters (use_contact_forces=XXX) use include_cfrc_ext_in_observation=XXX and contact_cost_weight=XXX
  • Fixed info x_position and y_position giving xpos rather than qpos observations, this differs by XXX.
  • Removed forward_reward from info as XXX (note: there still exists reward_forward which differs by XXX).

@Kallinteris-Andreas
Copy link
Collaborator Author

@pseudo-rnd-thoughts thanks, I have made a bunch of improvements.

Can you do a second pass of the change list, to make sure that all the changes are desired.

@pseudo-rnd-thoughts
Copy link
Member

Can you do a second pass of the change list, to make sure that all the changes are desired.

Without more detail, I can't understand all of the changes, could you do a documentation update

@Kallinteris-Andreas
Copy link
Collaborator Author

All the changes are in the docstings of the environments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants