-
-
Notifications
You must be signed in to change notification settings - Fork 895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] MuJoCo.Ant
contact forces being off by default is based on a wrong experiment
#214
Comments
To confirm, the issue is that the reward function is dependant on the |
@pseudo-rnd-thoughts
|
Ok, could you make a PR to address 1 |
Ok so what happened (from what I understand it, I would like @rodrigodelazcano to comment here)
The problem of this, is that unlike other MuJoCo environments the Ant is significantly different between v2/v3 and v4. (I have not looked into if this has concluded in wrong conclusions in research paper) The immediate step would be to run an ablation study with the same reward function with/without I do not know what the best solution to this problem is for MuJoCo environments. At the very least, we should know if external forces matter for |
@Kallinteris-Andreas yes, you are right. We missed taking into account the fact that adding/removing The past versions of the environment (v2/v3) are being kept for reproducibility of past research. We haven't modified anything and they can still be used with As you've mentioned v4 environments upgrade to use the latest mujoco bindings instead of However, because we observed successful learning without contact forces in openai/gym#2762 (comment), we decided to make the use of external forces (observation/reward) optional. Having said this I don't think there is a need for a v5 version other than your documentation updates mentioning that using contact forces will affect the reward function, which is highly important and I thank you for finding this out. What do you think @pseudo-rnd-thoughts? |
Thanks both of you, my proposed answer would be to add a note on the ant documentation to note this difference between v2/3 and v4 (with default parameters) and using |
@Kallinteris-Andreas Thanks for taking the time to run the experiment. Sounds good setting the default to use contact forces, I approve. Can you make this change to the v4 environment in Gymnasium. The removal of the |
I do not think we should change the defaults on
depending on the also keep in mind that in It would create unnecessary confusion |
Describe the bug
The problem
due to openai/gym#2762 (comment) it was decided that
use_contact_forces
would default toFalse
, butThe 2 different problem factorizations, used DIFFERENT REWARD FUNCTIONS
As you can see here, the reward functions are indeed different:
https://github.com/rodrigodelazcano/gym/blob/9c9741498dd0b613fb2d418f17d77ab5f6e60476/gym/envs/mujoco/ant_v4.py#L264
This behavior (of differing rewards functions) is also not documented at all (i can make a PR for that)
@rodrigodelazcano
Code at that commit: (it is same as the current code, as far we are concerned, with our current problem)
https://github.com/rodrigodelazcano/gym/blob/9c9741498dd0b613fb2d418f17d77ab5f6e60476/gym/envs/mujoco/ant_v4.py
Code example
No response
System info
No response
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: