TD3: fixed dimension of clipped_noise for target actions, added noise … #281

dosssman · 2022-10-01T06:59:23Z

Description

Closes #279.

td3_continuous_action.py: noise sampled to compute the Q network update target matches action dimensions in the buffer
td3_continuous_action.py: aforementioned noise is also scaled to match the scaling range of the actions.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
~~[ ] I have updated the documentation and previewed the changes via mkdocs serve.~~
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

…scaling based on action_scale

vercel · 2022-10-01T06:59:29Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Oct 19, 2022 at 6:46PM (UTC)

dosssman · 2022-10-01T07:02:40Z

@vwxyzjn td3 continuous JAX variant will probably be affected too though.

vwxyzjn · 2022-10-03T16:40:03Z

@joaogui1 could you check if TD3 (JAX) will be affected?

joaogui1 · 2022-10-03T16:50:57Z

@vwxyzjn check whether the jax implementation affected by this PR or if it will also need a fix like this PR?

vwxyzjn · 2022-10-03T16:52:09Z

@joaogui1, the latter :)

joaogui1 · 2022-10-03T16:59:50Z

Got it, it will need to be fix, creating the PR this moment

vwxyzjn · 2022-10-04T02:02:17Z

@dosssman thank you for the PR! Would you mind running some benchmark experiments to see if this change has a significant impact on the performance? If not, we don't even have to update the docs, since the main purpose of re-doing benchmark is to ensure no regression in performance.

cleanrl/benchmark/td3.sh

Lines 3 to 7 in f0bbf49

    
           OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \ 
        
               --env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \ 
        
               --command "poetry run python cleanrl/td3_continuous_action.py --track --capture-video" \ 
        
               --num-seeds 3 \ 
        
               --workers 3

dosssman · 2022-10-06T02:54:10Z

There is a performance regression on the Walker2d env, but the others are only marginally affected:

Report here

vwxyzjn · 2022-10-07T13:50:02Z

Hmm interesting. Thanks for running the expriments. Although one experiment failed... Would you mind re-running it?

vwxyzjn · 2022-10-18T15:51:11Z

Given that this is a performance-impacting change, I am re-running the benchmark now.

vwxyzjn · 2022-10-19T14:25:28Z

No material change to the performance (there is a minor regression in Walker2d-v2. I am going to update the docs and merge the PR. The regression report is at https://wandb.ai/openrlbenchmark/cleanrl-cache/reports/-281-CleanRL-s-TD3-TD3-Trgt-Action-Noise-Sample-Check--VmlldzoyODE5MTg0

TD3: fixed dimension of clippednoise for target actions, added noise …

5da6120

…scaling based on action_scale

vercel bot deployed to Preview October 1, 2022 06:59 View deployment

dosssman mentioned this pull request Oct 1, 2022

TD3 policy noise bugs #279

Closed

2 tasks

vwxyzjn requested a review from joaogui1 October 3, 2022 16:39

joaogui1 mentioned this pull request Oct 3, 2022

TD3 jax fix #285

Merged

17 tasks

vwxyzjn added 2 commits October 18, 2022 11:22

Merge branch 'master' into td3_trgt_noise_fixes

c6332a1

minor refactor

4bb6766

vercel bot deployed to Preview October 18, 2022 15:33 View deployment

Merge branch 'master' into td3_trgt_noise_fixes

47a78b9

vercel bot deployed to Preview October 18, 2022 18:33 View deployment

vwxyzjn mentioned this pull request Oct 19, 2022

RLops Guide #296

Closed

vwxyzjn added 2 commits October 19, 2022 14:41

update benchmark script

b697d27

update results

2fe5ee8

vercel bot deployed to Preview October 19, 2022 18:46 View deployment

vwxyzjn approved these changes Oct 19, 2022

View reviewed changes

vwxyzjn merged commit 331cb39 into vwxyzjn:master Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TD3: fixed dimension of clipped_noise for target actions, added noise … #281

TD3: fixed dimension of clipped_noise for target actions, added noise … #281

dosssman commented Oct 1, 2022 •

edited by vwxyzjn

Loading

vercel bot commented Oct 1, 2022 •

edited

Loading

dosssman commented Oct 1, 2022

vwxyzjn commented Oct 3, 2022

joaogui1 commented Oct 3, 2022

vwxyzjn commented Oct 3, 2022

joaogui1 commented Oct 3, 2022

vwxyzjn commented Oct 4, 2022 •

edited

Loading

dosssman commented Oct 6, 2022

vwxyzjn commented Oct 7, 2022 •

edited

Loading

vwxyzjn commented Oct 18, 2022

vwxyzjn commented Oct 19, 2022 •

edited

Loading

TD3: fixed dimension of clipped_noise for target actions, added noise … #281

TD3: fixed dimension of clipped_noise for target actions, added noise … #281

Conversation

dosssman commented Oct 1, 2022 • edited by vwxyzjn Loading

Description

Types of changes

Checklist:

vercel bot commented Oct 1, 2022 • edited Loading

dosssman commented Oct 1, 2022

vwxyzjn commented Oct 3, 2022

joaogui1 commented Oct 3, 2022

vwxyzjn commented Oct 3, 2022

joaogui1 commented Oct 3, 2022

vwxyzjn commented Oct 4, 2022 • edited Loading

dosssman commented Oct 6, 2022

vwxyzjn commented Oct 7, 2022 • edited Loading

vwxyzjn commented Oct 18, 2022

vwxyzjn commented Oct 19, 2022 • edited Loading

dosssman commented Oct 1, 2022 •

edited by vwxyzjn

Loading

vercel bot commented Oct 1, 2022 •

edited

Loading

vwxyzjn commented Oct 4, 2022 •

edited

Loading

vwxyzjn commented Oct 7, 2022 •

edited

Loading

vwxyzjn commented Oct 19, 2022 •

edited

Loading