-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mujoco-v5
initial commit
#104
mujoco-v5
initial commit
#104
Conversation
@Kallinteris-Andreas Thanks for all of this hard work, we are planning on having a gymnasium v0.28.2 and v0.29 in the next few weeks. If there are any changes you want to make in gymnasium, could you do them soon, thanks Also, for what reason is the CI failing? |
|
@Kallinteris-Andreas I can't remember the previous conversations we had about this but I don't think we are planning on moving the mujoco environments (v2, v3 or v4) to gymnasium robotics. |
Yeah, The After validation, I can move the PR to the gymnasium repo, it is no problem. |
@pseudo-rnd-thoughts & @rodrigodelazcano Note: the changelog here will be used in the Thanks! |
Could you add much more detail to each point, in particular, why the change was made? It would be great if you could look at the notes with minimal previous knowledge of the environment and understand the changes For examples
|
@pseudo-rnd-thoughts thanks, I have made a bunch of improvements. Can you do a second pass of the change list, to make sure that all the changes are desired. |
Without more detail, I can't understand all of the changes, could you do a documentation update |
All the changes are in the docstings of the environments |
cont from: #91
Description
Adds the
v5
version of themujoco
environments.Changelog
mujoco
minimum version is 2.3.3 now.mujoco
models with the usage of thexml_file
argument (previously only a few changes could be made).default_camera_config
argument, a dictionary for setting themj_camera
properties, primarily useful for custom environments.env.observation_structure
, a dictionary indicating the composition of the observation space (e.g.qpos
,qvel
), useful for building tooling and wrappers for the MuJoCo environments.info
withreset()
, previously an empty dictionary would be returned.frame_skip
argument.Ant
healthy_reward
being given on every step (even when the Ant is unhealthy), now it is only given when the Ant is healthy. Theinfo
"reward_survive" is updated with this change.contact_cost
, previously it was only included whenuse_contact_forces=True
(can be set to0
withcontact_cost_weight=0
).worldbody
'scfrc_ext
from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).main_body
argument.forward_reward_weight
argument.include_cfrc_ext_in_observation
argument.use_contact_forces
argument (note: its functionality has been replaced withinclude_cfrc_ext_in_observation
andcontact_cost_weight
).info
"reward_ctrl" sometimes containingcontact_cost
instead ofctrl_cost
.info
"x_position" & "y_position" givingxpos
instead ofqpos
observations (xpos
observations are behind 1mj_step()
).info
(note: there still exits "reward_forward", which contains the same information).Half Cheetah
xml_file
argument.info
"reward_run" → "reward_forward" (to be consistent with the other environments).Hopper
coordinate='global'
, but has near identical behavior).healthy_reward
being given on every step (even when the Hopper is unhealthy), now it is only given when the Ant is healthy. Theinfo
"reward_survive" is updated with this change.xml_file
argument.info
"reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".Humanoid
healthy_reward
being given on every step (even when the Humanoid is unhealthy), now it is only given when the Ant is healthy. Theinfo
"reward_survive" is updated with this change.contact_cost
(and the correspondingcontact_cost_weight
andcontact_cost_range
arguments).worldbody
cinert
&cvel
&cfrc_ext
&root
/freejoint
`qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).xml_file
argument.include_cinert_in_observation
,include_cvel_in_observation
,include_qfrc_actuator_in_observation
,include_cfrc_ext_in_observation
arguments.info
"x_position" & "y_position" givingxpos
instead ofqpos
observations (xpos
observations are behind 1mj_step()
).info
"tendon_lenght" & "tendon_velocity".info
"reward_alive" → "reward_survive" (to be consistent with the other environments).info
"reward_linvel" → "reward_forward" (to be consistent with the other environments).info
"reward_quadctrl" → "reward_ctrl" (to be consistent with the other environments).info
(note: there still exits "reward_forward").Humanoid Standup
worldbody
cinert
&cvel
&cfrc_ext
&root
/freejoint
`qfrc_actuator from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).xml_file
,uph_cost_weight
,ctrl_cost_weight
,impact_cost_weight
,impact_cost_range
,reset_noise_scale
,exclude_current_positions_from_observation
,include_cinert_in_observation
,include_cvel_in_observation
,include_qfrc_actuator_in_observation
,include_cfrc_ext_in_observation
arguments.info
"tendon_lenght" & "tendon_velocity".info
"x_position" & "y_position" & "z_distance_from_origin".InvertedDoublePendulum
healthy_reward
being given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. Theinfo
"reward_survive" is updated with this change.qfrc_constraint
("constraint force") of the hinges from the observation space (since they are constantly 0, and therefore provide no useful information to the agent, should result is slightly faster training).xml_file
,healthy_reward
,reset_noise_scale
arguments.info
"reward_survive", "distance_penalty", "velocity_penalty".InvertedPendulum
healthy_reward
being given on every step (even when the Pendulum is unhealthy), now it is only given when the Ant is healthy. Theinfo
"reward_survive" is updated with this change.xml_file
,reset_noise_scale
arguments.info
"reward_survive".Pusher
xml_file
argument.reward_near_weight
,reward_dist_weight
,reward_control_weight
arguments.info
"reward_ctrl" being not being multiplied by the reward weight.info
"reward_near".Reacher
xml_file
argument.reward_dist_weight
,reward_control_weight
arguments.info
"reward_ctrl" being not being multiplied by the reward weight.Swimmer
xml_file
argument.forward_reward_weight
,ctrl_cost_weight
,reset_noise_scale
,exclude_current_positions_from_observation
arguments.info "reward_fwd"/ "forward_reward" → "reward_forward"
(to be consistent with the other environments).Walker2D
coordinate='global'
), now both feet havefriction==1.9
, previously the right foot hadfriction==0.9
and left foot hadfriction==1.9
.healthy_reward
being given on every step (even when the Walker2D is unhealthy), now it is only given when the Ant is healthy. Theinfo
"reward_survive" is updated with this change.xml_file
argument.info
"reward_forward", "reward_ctrl", "reward_survive", "z_distance_from_origin".Type of change
add new revision of
MuJoCo
environments.Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)Benchmarks
v3
→v4
)https://github.com/Kallinteris-Andreas/gymnasium-mujuco-v5-envs-validation
issues fixed:
"global"
with"local"
coordinate system google-deepmind/mujoco#833Humanoid
&Ant
Have wronginfo["distance_from_origin"]
Gymnasium#539Ant
&Humanoid
have wrong "x_position" & "y_position"info
Gymnasium#521Humanoid-v4
does not havecontact_cost
Gymnasium#504InvertedDoublePendulumEnv
andInvertedPendulumEnv
always gives "alive_bonus" Gymnasium#500MuJoCo/Walker2d
left foot has different friction than right foot Gymnasium#477mujoco.InvertedDoublePendulum
last 2 observations (constraints) are const 0 Gymnasium#228MuJoCo.Ant
contact forces being off by default is based on a wrong experiment Gymnasium#214TODO
Finished environments
Cutting room floor (not included in the
v5
release)Humanoid
sAnt
&Humanoid
after stepManySegmentSwimmer
&CoupledHalfCheetah
environmentsreset_noise_scale
toPusher
&Reacher
Credits
Lead Developer: Kallinteris Andreas
Debugging assistance & setting specification/requirements: Rodrigo, Mark Towers
Technical Advisor: saran-t (helped with the creation of the new
Hopper
andWalker2D
models)