You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder:
Have you considered the environment version into consideration in the results?
Hello there,
Recently, we reproduce some experiments in offline reinforcement learning and find that the decision transformer cites the result of CQL from the original paper. However, the problem is that DT uses mujoco version 2 like(hopper-v2, walker2d-v2), while original CQL uses mujoco version 0 like(hopper-v0, walker2d-v0), and the reward scale is different in these environments. So we run DT and CQL in the same environment(hopper-v2, walker2d-v2), but CQL is better than DT in almost all the tasks (except for hopper-replay). So I wonder:
Looking forward to your reply!
Best Wishes
The text was updated successfully, but these errors were encountered: