[Bug Report] `MuJoCo.Ant` contact forces being off by default is based on a wrong experiment #214

Kallinteris-Andreas · 2022-12-12T17:24:54Z

Describe the bug

The problem

due to openai/gym#2762 (comment) it was decided that use_contact_forces would default to False, but
The 2 different problem factorizations, used DIFFERENT REWARD FUNCTIONS

As you can see here, the reward functions are indeed different:
https://github.com/rodrigodelazcano/gym/blob/9c9741498dd0b613fb2d418f17d77ab5f6e60476/gym/envs/mujoco/ant_v4.py#L264

This behavior (of differing rewards functions) is also not documented at all (i can make a PR for that)

@rodrigodelazcano

Code at that commit: (it is same as the current code, as far we are concerned, with our current problem)
https://github.com/rodrigodelazcano/gym/blob/9c9741498dd0b613fb2d418f17d77ab5f6e60476/gym/envs/mujoco/ant_v4.py

Code example

No response

System info

No response

Additional context

No response

Checklist

I have checked that there is no similar issue in the repo

The text was updated successfully, but these errors were encountered:

pseudo-rnd-thoughts · 2022-12-13T13:26:05Z

To confirm, the issue is that the reward function is dependant on the use_contact_forces which is not documented

Kallinteris-Andreas · 2022-12-14T12:34:25Z

@pseudo-rnd-thoughts
There are 2 issues

The undocumented reward functions differences with use_contact_forces
The fact that use_contact_forces defaults to =False based on a wrong experiment (this should be investigated for a potential MuJoCo-v5 or Brax based environments)

pseudo-rnd-thoughts · 2022-12-14T12:57:22Z

Ok, could you make a PR to address 1
Could you explain why it was based on the wrong experiment? I wasn't involved and don't know anything about the experiments. What did they do? What is the issue with that? What do you propose the solution should be? What does your experiment show?

Kallinteris-Andreas · 2022-12-15T21:00:14Z

Ok so what happened (from what I understand it, I would like @rodrigodelazcano to comment here)

back when using the mujoco-py bindings the external forces (mujoco.cfrc_ext) were not working which resulted in all the those observations being 0 and contact cost == 0
because of that, when developing the new environment with mujoco bindings they added an option to disable external forces (mujoco.cfrc_ext) (use_contact_forces) which alters both the observations and the reward function (by disabling contact_cost)
Then they run an ablation study to see if disabling mujoco.cfrc_ext (use_contact_forces=False) would provide better results (here they made the crucial mistake of comparing 2 implementations with 2 different reward functions because they forgot that use_contact_forces alters the reward function) Change mujoco_py bindings for mujoco Deepmind bindings openai/gym#2762 (comment)
The experiment showed that disabling mujoco.cfrc_ext provided significantly bigger returns (which makes sense since they disabled the biggest cost from the reward)

The problem of this, is that unlike other MuJoCo environments the Ant is significantly different between v2/v3 and v4. (I have not looked into if this has concluded in wrong conclusions in research paper)

The immediate step would be to run an ablation study with the same reward function with/without mujoco.cfrc_ext.
I do not have computing power to spare to run this experiment anytime soon.

I do not know what the best solution to this problem is for MuJoCo environments.
A possible solution would be to make ant-v5 (that would have the same reward function as ant-v2/v3)

At the very least, we should know if external forces matter for Ant environments for Gymnasium.Brax

rodrigodelazcano · 2022-12-15T22:14:06Z

@Kallinteris-Andreas yes, you are right. We missed taking into account the fact that adding/removing contact_cost to the reward function would affect the return differently in the long run. The initial justification of having a performance degradation due to contact forces in the observations is wrong, sorry about that. I'll run this weekend the suggested experiments to make sure that's the case.

The past versions of the environment (v2/v3) are being kept for reproducibility of past research. We haven't modified anything and they can still be used with mujoco_py and older versions of mujoco.

As you've mentioned v4 environments upgrade to use the latest mujoco bindings instead of mujoco-py since it's no longer maintained. In the process we also decided to fix the external contact forces issue you've mentioned that appeared with later mujoco versions mujoco>=2.0. Thus the reward function of v4 is no different from v2/v3 if external contact forces are included in observation and reward, with use_contact_forces.

However, because we observed successful learning without contact forces in openai/gym#2762 (comment), we decided to make the use of external forces (observation/reward) optional. Having said this I don't think there is a need for a v5 version other than your documentation updates mentioning that using contact forces will affect the reward function, which is highly important and I thank you for finding this out. What do you think @pseudo-rnd-thoughts?

pseudo-rnd-thoughts · 2022-12-16T12:00:26Z

Thanks both of you, my proposed answer would be to add a note on the ant documentation to note this difference between v2/3 and v4 (with default parameters) and using use_contact_forces for equivalence between mujoco-py and mujoco bindings.
Additionally, once the experiments are completed, then we need to add a comment to the original PR which we can link in the documentation. This should avoid the need for a v5 environment but if users read the documentation they can understand the difference between versions, etc

Kallinteris-Andreas · 2023-05-14T15:32:16Z

Surprisingly including the contact forces to the observation space, does not appear to have any significant impact to the average performance
but including the contact forces significantly reduces the training variance.

I think the default should to observe the contact forces, as stable training performance is important for evaluating training algorithms.

Note: also the worldbody contact forces were asserted to be 0 through the entire process

rodrigodelazcano · 2023-05-14T15:44:08Z

@Kallinteris-Andreas Thanks for taking the time to run the experiment. Sounds good setting the default to use contact forces, I approve. Can you make this change to the v4 environment in Gymnasium.

The removal of the worldbody elements can be done in v5.

Kallinteris-Andreas · 2023-05-14T16:06:16Z

I do not think we should change the defaults on v4 since users would use and get a different environment when they do:

import gymnasium
import gymnasium_robotics
env = gymnasium.make('Ant-v4')

depending on the gymnasium and gymnasium_robotics versions

also keep in mind that in v4 changing use_conctact_focres would also change the rewards

It would create unnecessary confusion

Kallinteris-Andreas added the bug Something isn't working label Dec 12, 2022

Kallinteris-Andreas mentioned this issue Dec 12, 2022

Add MaMuJoCo (Multi-agent mujoco) Environments Farama-Foundation/Gymnasium-Robotics#53

Merged

rodrigodelazcano mentioned this issue Dec 16, 2022

Which Bodies does MuJoCo.Humanoid have #204

Closed

Kallinteris-Andreas mentioned this issue Jan 9, 2023

[Proposal] Mujoco-v5 Farama-Foundation/Gymnasium-Robotics#91

Closed

1 task

Kallinteris-Andreas mentioned this issue May 14, 2023

[Bug Report] Humanoid-v4 does not have contact_cost #504

Closed

1 task

Kallinteris-Andreas closed this as completed May 18, 2023

Kallinteris-Andreas mentioned this issue Jun 1, 2023

mujoco-v5 initial commit Farama-Foundation/Gymnasium-Robotics#104

Closed

34 tasks

Kallinteris-Andreas mentioned this issue Jun 30, 2023

Add MuJoCo v5 environments #572

Merged

35 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] `MuJoCo.Ant` contact forces being off by default is based on a wrong experiment #214

[Bug Report] `MuJoCo.Ant` contact forces being off by default is based on a wrong experiment #214

Kallinteris-Andreas commented Dec 12, 2022

pseudo-rnd-thoughts commented Dec 13, 2022

Kallinteris-Andreas commented Dec 14, 2022

pseudo-rnd-thoughts commented Dec 14, 2022

Kallinteris-Andreas commented Dec 15, 2022 •

edited

Loading

rodrigodelazcano commented Dec 15, 2022

pseudo-rnd-thoughts commented Dec 16, 2022 •

edited

Loading

Kallinteris-Andreas commented May 14, 2023

rodrigodelazcano commented May 14, 2023 •

edited

Loading

Kallinteris-Andreas commented May 14, 2023

[Bug Report] MuJoCo.Ant contact forces being off by default is based on a wrong experiment #214

[Bug Report] MuJoCo.Ant contact forces being off by default is based on a wrong experiment #214

Comments

Kallinteris-Andreas commented Dec 12, 2022

Describe the bug

The problem

Code example

System info

Additional context

Checklist

pseudo-rnd-thoughts commented Dec 13, 2022

Kallinteris-Andreas commented Dec 14, 2022

pseudo-rnd-thoughts commented Dec 14, 2022

Kallinteris-Andreas commented Dec 15, 2022 • edited Loading

rodrigodelazcano commented Dec 15, 2022

pseudo-rnd-thoughts commented Dec 16, 2022 • edited Loading

Kallinteris-Andreas commented May 14, 2023

rodrigodelazcano commented May 14, 2023 • edited Loading

Kallinteris-Andreas commented May 14, 2023

[Bug Report] `MuJoCo.Ant` contact forces being off by default is based on a wrong experiment #214

[Bug Report] `MuJoCo.Ant` contact forces being off by default is based on a wrong experiment #214

Kallinteris-Andreas commented Dec 15, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Dec 16, 2022 •

edited

Loading

rodrigodelazcano commented May 14, 2023 •

edited

Loading