TD3 policy noise bugs #279

tomjur · 2022-09-30T08:37:19Z

Problem Description

There are two bugs in

cleanrl/cleanrl/td3_continuous_action.py

Line 209 in e466f6e

    
           clipped_noise = (torch.randn_like(torch.Tensor(actions[0])) * args.policy_noise).clamp(

(1) The same noise is used for all the batch actions.
(2) Action scale is not taken into account for the noise.

Checklist

I have installed dependencies via poetry install (see CleanRL's installation guideline.
I have checked that there is no similar issue in the repo (required)

Current Behavior

(1) Takes noise size from actions[0]
(2) No scaling is performed on the noise, but the policy could have a different scale (see

cleanrl/cleanrl/td3_continuous_action.py

Line 114 in e466f6e

return x * self.action_scale + self.action_bias

)

Expected Behavior

(1) Should take shape of data.actions
(2) Scale the noise according to the policy scale

Possible Solution

(1) replace torch.Tensor(actions[0]) with torch.Tensor(data.actions)
(2) Multiply the noise with target_actor.action_scale

Steps to Reproduce

Run the script https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py, stop on line 209 to view shapes.

The text was updated successfully, but these errors were encountered:

dosssman · 2022-10-01T07:01:40Z

Thanks a lot for the heads up.
I have added the fix in PR #281 mostly as suggested.
not sure if it will have much impact on the results, as the noise application process is ... well, noisy.

tomjur · 2022-10-01T07:26:11Z

Right, probably so (:
Thanks for the quick bug-fix!

vwxyzjn assigned dosssman Sep 30, 2022

tomjur closed this as completed Oct 1, 2022

vwxyzjn reopened this Oct 4, 2022

vwxyzjn mentioned this issue Oct 4, 2022

TD3: fixed dimension of clipped_noise for target actions, added noise … #281

Merged

19 tasks

vwxyzjn closed this as completed in #281 Oct 19, 2022

araffin mentioned this issue Nov 21, 2022

SAC jax #300

Open

20 tasks

araffin mentioned this issue Feb 8, 2023

[Question] Wrong scaled_action for continuous actions in _sample_action()? DLR-RM/stable-baselines3#1269

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TD3 policy noise bugs #279

TD3 policy noise bugs #279

tomjur commented Sep 30, 2022 •

edited

Loading

dosssman commented Oct 1, 2022

tomjur commented Oct 1, 2022

TD3 policy noise bugs #279

TD3 policy noise bugs #279

Comments

tomjur commented Sep 30, 2022 • edited Loading

Problem Description

Checklist

Current Behavior

Expected Behavior

Possible Solution

Steps to Reproduce

dosssman commented Oct 1, 2022

tomjur commented Oct 1, 2022

tomjur commented Sep 30, 2022 •

edited

Loading