Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TD3 policy noise bugs #279

Closed
2 tasks done
tomjur opened this issue Sep 30, 2022 · 2 comments · Fixed by #281
Closed
2 tasks done

TD3 policy noise bugs #279

tomjur opened this issue Sep 30, 2022 · 2 comments · Fixed by #281
Assignees

Comments

@tomjur
Copy link

tomjur commented Sep 30, 2022

Problem Description

There are two bugs in

clipped_noise = (torch.randn_like(torch.Tensor(actions[0])) * args.policy_noise).clamp(

(1) The same noise is used for all the batch actions.
(2) Action scale is not taken into account for the noise.

Checklist

Current Behavior

(1) Takes noise size from actions[0]
(2) No scaling is performed on the noise, but the policy could have a different scale (see

return x * self.action_scale + self.action_bias

)

Expected Behavior

(1) Should take shape of data.actions
(2) Scale the noise according to the policy scale

Possible Solution

(1) replace torch.Tensor(actions[0]) with torch.Tensor(data.actions)
(2) Multiply the noise with target_actor.action_scale

Steps to Reproduce

  1. Run the script https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py, stop on line 209 to view shapes.
@dosssman
Copy link
Collaborator

dosssman commented Oct 1, 2022

Thanks a lot for the heads up.
I have added the fix in PR #281 mostly as suggested.
not sure if it will have much impact on the results, as the noise application process is ... well, noisy.

@tomjur
Copy link
Author

tomjur commented Oct 1, 2022

Right, probably so (:
Thanks for the quick bug-fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants