Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TD3: fixed dimension of clipped_noise for target actions, added noise … #281

Merged
merged 6 commits into from
Oct 19, 2022

Conversation

dosssman
Copy link
Collaborator

@dosssman dosssman commented Oct 1, 2022

Description

Closes #279.

  • td3_continuous_action.py: noise sampled to compute the Q network update target matches action dimensions in the buffer
  • td3_continuous_action.py: aforementioned noise is also scaled to match the scaling range of the actions.

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • [ ] I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format with width=500 and height=300).
    • I have added links to the tracked experiments.
    • I have updated the overview sections at the docs and the repo
  • I have updated the tests accordingly (if applicable).

@vercel
Copy link

vercel bot commented Oct 1, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Oct 19, 2022 at 6:46PM (UTC)

@dosssman
Copy link
Collaborator Author

dosssman commented Oct 1, 2022

@vwxyzjn td3 continuous JAX variant will probably be affected too though.

@vwxyzjn vwxyzjn requested a review from joaogui1 October 3, 2022 16:39
@vwxyzjn
Copy link
Owner

vwxyzjn commented Oct 3, 2022

@joaogui1 could you check if TD3 (JAX) will be affected?

@joaogui1
Copy link
Collaborator

joaogui1 commented Oct 3, 2022

@vwxyzjn check whether the jax implementation affected by this PR or if it will also need a fix like this PR?

@vwxyzjn
Copy link
Owner

vwxyzjn commented Oct 3, 2022

@joaogui1, the latter :)

@joaogui1
Copy link
Collaborator

joaogui1 commented Oct 3, 2022

Got it, it will need to be fix, creating the PR this moment

@joaogui1 joaogui1 mentioned this pull request Oct 3, 2022
17 tasks
@vwxyzjn
Copy link
Owner

vwxyzjn commented Oct 4, 2022

@dosssman thank you for the PR! Would you mind running some benchmark experiments to see if this change has a significant impact on the performance? If not, we don't even have to update the docs, since the main purpose of re-doing benchmark is to ensure no regression in performance.

OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
--env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
--command "poetry run python cleanrl/td3_continuous_action.py --track --capture-video" \
--num-seeds 3 \
--workers 3

@dosssman
Copy link
Collaborator Author

dosssman commented Oct 6, 2022

There is a performance regression on the Walker2d env, but the others are only marginally affected:

Report here

@vwxyzjn
Copy link
Owner

vwxyzjn commented Oct 7, 2022

Hmm interesting. Thanks for running the expriments. Although one experiment failed... Would you mind re-running it?
image

@vwxyzjn
Copy link
Owner

vwxyzjn commented Oct 18, 2022

Given that this is a performance-impacting change, I am re-running the benchmark now.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Oct 19, 2022

No material change to the performance (there is a minor regression in Walker2d-v2. I am going to update the docs and merge the PR. The regression report is at https://wandb.ai/openrlbenchmark/cleanrl-cache/reports/-281-CleanRL-s-TD3-TD3-Trgt-Action-Noise-Sample-Check--VmlldzoyODE5MTg0

image

image

image

image

image

image

@vwxyzjn vwxyzjn mentioned this pull request Oct 19, 2022
@vwxyzjn vwxyzjn merged commit 331cb39 into vwxyzjn:master Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TD3 policy noise bugs
3 participants