Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Mujoco-v5 #91

Closed
1 task done
Kallinteris-Andreas opened this issue Jan 9, 2023 · 22 comments
Closed
1 task done

[Proposal] Mujoco-v5 #91

Kallinteris-Andreas opened this issue Jan 9, 2023 · 22 comments

Comments

@Kallinteris-Andreas
Copy link
Collaborator

Kallinteris-Andreas commented Jan 9, 2023

Proposal 0

After the Gymnasium/envs/mujoco gets moved here, create a new revision (v5), that fixes the 0 const observations in the following domains:

Humanoid (+standup): Farama-Foundation/Gymnasium#204, Farama-Foundation/Gymnasium#504
Ant: Farama-Foundation/Gymnasium#204, Farama-Foundation/Gymnasium#214
Reacher: Farama-Foundation/Gymnasium#220
InvertedDoublePendulum: Farama-Foundation/Gymnasium#228, Farama-Foundation/Gymnasium#500
InvertedPendulum: Farama-Foundation/Gymnasium#500

Note: I have written a quick test, that only fails in the above environments

_MUJOCO_GYM_ENVIROMENTS = [  # Note: this could be made dynamic, i was just too lazy to do it
    "Ant-v4",
    "HalfCheetah-v4",
    "Hopper-v4",
    "HumanoidStandup-v4",
    "Humanoid-v4",
    "Reacher-v4",
    "Swimmer-v4",
    "Pusher-v4",
    "Walker2d-v4",
    "InvertedPendulum-v4",
    "InvertedDoublePendulum-v4",
]

@pytest.mark.parametrize("env_name", _MUJOCO_GYM_ENVIROMENTS)
def test_zeros_in_observation(env_name: str):
    """Asserts that all enviroments containt valid (non-zero) observations

    This Tests was created because some mujoco_enviroments pre version 5 did have part of the observation space that was
    """
    if env_name == "Ant-v4":
        env = gym.make(env_name, use_contact_forces=True)
    else:
        env = gym.make(env_name)
    env.action_space.seed(0)
    env.reset(seed=0)
    observation, _, _, _, _ = env.step(env.action_space.sample())
    assert (observation != 0).all()

Proposal 1

Register some new environments to the suite from mamujoco like coupled half cheetah https://github.com/Kallinteris-Andreas/Gymnasium-Robotics-Kalli/blob/main/gymnasium_robotics/envs/multiagent_mujoco/coupled_half_cheetah.py#L27

Proposal 2

Update Hopper and Walker2D models to not require coordinate="Global": google-deepmind/mujoco#833

And fix Walker2D feet Farama-Foundation/Gymnasium#477

Checklist

  • I have checked that there is no similar issue in the repo (required)
@rodrigodelazcano
Copy link
Member

Can you elaborate more on Proposal 1 and how would this work. I don't know if you mean to register mamujoco environments with Gymnasium id's.

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Jan 13, 2023

I mean adding them to, to the suite of available environments.

@pseudo-rnd-thoughts
Copy link
Member

Im unconvinced by the test showing this to be an issue.
It is quite possible for a couple of the observation elements for the first N time steps.
Could you upgrade the tests to show that from the reset obs and for the first (100) time steps then the same elements have a fixed value, don't have to be zero.

If we make this change, we will need to train agents to ensure there is no performance regression is a result of the change.

It is a shame we can't do v4.1 as the environment number

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented Feb 20, 2023

No, the test is indeed not good enough to prove the point (and it does fail after the fix to the humanoid)

The analysis done in the issues proves it

I have done testing for millions of steps, with them being zero (it is in the other issues)

@pseudo-rnd-thoughts
Copy link
Member

Could this include new compiled versions of all of the models so they have the same format

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented May 2, 2023

Could this include new compiled versions of all of the models so they have the same format

I would rather not, I am not sure if I can assert identical behavior because they add autolimits="true"
(and see no benefit in it, other than aesthetics)

@Kallinteris-Andreas
Copy link
Collaborator Author

@pseudo-rnd-thoughts Reason as to why I do not trust the mujoco/comple

Figure_1
v4 is the existing Hopper-v4 model (blue)
v5 is with my hand ported xml model (orange)
v5_gen is the auto generated xml model from mujoco/complile (green)

Notice that for the first 20k steps v4 and v5 overlap (close enough, that it can not be seen with the graph), and slowly the floating point errors accumulate
while the v5_gen has different behavior from the beginning

@pseudo-rnd-thoughts
Copy link
Member

@pseudo-rnd-thoughts Reason as to why I do not trust the mujoco/comple

Figure_1

Very interesting, thanks for doing that
Deepmind never claimed that compile produces identical models but this is very different, particularly, in training. To check if this is expected.
One response could be that we have only run this once and over several runs the differences average out

@saran-t
Copy link

saran-t commented May 5, 2023

@pseudo-rnd-thoughts Reason as to why I do not trust the mujoco/comple

Figure_1 v4 is the existing Hopper-v4 model (blue) v5 is with my hand ported xml model (orange) v5_gen is the auto generated xml model from mujoco/complile (green)

Notice that for the first 20k steps v4 and v5 overlap (close enough, that it can not be seen with the graph), and slowly the floating point errors accumulate while the v5_gen has different behavior from the beginning

Can you please provide us with the complete model for each of the three cases?

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented May 6, 2023

Here is a test for 10 runs with 200k steps
Figure_1
v4 is the existing Hopper-v4 model (blue)
v5 is with my hand ported xml model (orange)
v5_gen is the auto generated xml model from mujoco/complile (green)

Shaded Area Show mins and maxs

Notice that for the first 20k steps v4 and v5 overlap (close enough, that it can not be seen with the graph), and slowly the floating point errors accumulate, while still being in the same ballpark
while the v5_gen has different behavior from the beginning

@saran-t

@saran-t
Copy link

saran-t commented May 6, 2023

Please try the auto-converted XML in https://colab.research.google.com/drive/1slY_8RlzzRffDQhLt3uqazyD_OU9pd2r#scrollTo=lH94cPjK4SKZ and let me know how that works.

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented May 6, 2023

@saran-t
This is identical to:
https://github.com/Kallinteris-Andreas/gym-mjc-v5-model-validation/blob/main/hopper_new_gen.xml
(other than numbers being in full prec)

you can use a service like https://www.diffchecker.com/text-compare/ to compare them

@saran-t
Copy link

saran-t commented May 6, 2023

(other than numbers being in full prec)

That is literally the whole point? The only difference that you're seeing is just the tail end of the precision. You can see at the bottom of the Colab that the auto-converted XMLs have bit-identical results.

To be clear, what I'm saying is you should just replace your existing XML with https://colab.research.google.com/drive/1slY_8RlzzRffDQhLt3uqazyD_OU9pd2r#scrollTo=s4DDElgo4_d6 (which is the absolute minimal diff that results in absolutely no change whatsoever under MuJoCo 2.3.3).

@Kallinteris-Andreas
Copy link
Collaborator Author

@saran-t k, I will test it

I am under the impression that the only change was printing my numbers at full prec

>>> numpy.array(-0.19999999999999996, dtype=numpy.float32) == numpy.array(-0.2, dtype=numpy.float32)
True

i will test your manually made xml

@saran-t
Copy link

saran-t commented May 6, 2023

Your problem there is float32. At float64 those two numbers are different (https://evanw.github.io/float-toy/).

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented May 7, 2023

Figure_1
v4 is the existing Hopper-v4 model (blue) https://github.com/Kallinteris-Andreas/gym-mjc-v5-model-validation/blob/main/hopper.xml
v5 is with my hand ported xml model (orange) https://github.com/Kallinteris-Andreas/gym-mjc-v5-model-validation/blob/main/hopper_new.xml
v5_gen is the auto generated xml model from mujoco/complile (green) https://github.com/Kallinteris-Andreas/gym-mjc-v5-model-validation/blob/main/hopper_new_gen.xml
@saran_t manually transcribed (red) https://github.com/Kallinteris-Andreas/gym-mjc-v5-model-validation/blob/main/hopper_saran_t_trans.xml

Notice that for the first 20k steps v4 and v5 overlap (close enough, that it can not be seen with the graph), and slowly the floating point errors accumulate, while still being in the same ballpark
while the v5_gen and @saran_t manually transcribed have different behavior from the beginning

@saran-t
Copy link

saran-t commented May 7, 2023

Not really sure how that can be the case since I've verified that the model is literally identical (bit perfect) to the v4 XML...

@Kallinteris-Andreas
Copy link
Collaborator Author

I am double-checking, this weirds me out too.

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented May 8, 2023

I re-run the benchmark (I had pytorch related issues)
Figure_1

(The other models have different behavior, I am not showing them)
As you can see @saran-t's model is identical to hopper-v4.xml
On top of that I run a test for 1000 episodes with 4000 simulation steps per episode (with random actions) and the behavior 100% identical between both models (https://github.com/Kallinteris-Andreas/gym-mjc-v5-model-validation/blob/main/test_hop.py).

@saran-t

  1. Can you please do the same with walker2d.xml https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/mujoco/assets/walker2d.xml Never mind I did it myself https://github.com/Kallinteris-Andreas/gym-mjc-v5-model-validation/blob/main/walker2d_ported.xml
  2. According to your expert opinion does the model you made have identical behavior with the previous model ALWAYS on mujoco==2.3.3
  3. According to your expert opinion does the model you made have identical behavior with the previous model ALWAYS on 2.1.3<=mujoco<=2.3.3
  4. According to your expert opinion does the model you made have identical behavior with the previous model ALWAYS on 1.5<=mujoco-py

questions 2-4 are there to know if we can change the model for environment versions v2, v3, v4

@pseudo-rnd-thoughts
Copy link
Member

When this is done, I think we should write a blog post as this will be a better way of publishing the changes and for users to cite the changes compared to a PR.
Additionally, it would be easier to include all of the results in there.
@Kallinteris-Andreas Would you be interested in this?

@Kallinteris-Andreas
Copy link
Collaborator Author

Kallinteris-Andreas commented May 15, 2023

Sure, where would the blog post be published? (farama.org?)

In the meantime the active changelog is here: #104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants