The results on Mujoco reported in paper might be heavily influenced by env version #42

linprophet · 2022-05-08T13:57:51Z

Hello there,

Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder：

Have you considered the environment version into consideration in the results?
Refer to how to get the score of an expert policy and some other details #16 . The score is normalized by an export policy from https://github.com/rail-berkeley/d4rl/blob/master/d4rl/infos.py . However, the results based on the official code are far away from the results reported in the paper. Or did I miss some key components in DT code?

Looking forward to your reply!

Best Wishes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The results on Mujoco reported in paper might be heavily influenced by env version #42

The results on Mujoco reported in paper might be heavily influenced by env version #42

linprophet commented May 8, 2022

The results on Mujoco reported in paper might be heavily influenced by env version #42

The results on Mujoco reported in paper might be heavily influenced by env version #42

Comments

linprophet commented May 8, 2022