Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Gymnasium-compliant PPO script #320

Merged
merged 43 commits into from
Dec 13, 2022
Merged

Implement Gymnasium-compliant PPO script #320

merged 43 commits into from
Dec 13, 2022

Conversation

dtch1997
Copy link
Collaborator

@dtch1997 dtch1997 commented Nov 15, 2022

Description

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

Checklist:

  • I've read the CONTRIBUTION guide (required).
  • I have ensured pre-commit run --all-files passes (required).
  • I have updated the documentation and previewed the changes via mkdocs serve.
  • I have updated the tests accordingly (if applicable).

If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

  • I have contacted vwxyzjn to obtain access to the openrlbenchmark W&B team (required).
  • I have tracked applicable experiments in openrlbenchmark/cleanrl with --capture-video flag toggled on (required).
  • I have added additional documentation and previewed the changes via mkdocs serve.
    • I have explained note-worthy implementation details.
    • I have explained the logged metrics.
    • I have added links to the original paper and related papers (if applicable).
    • I have added links to the PR related to the algorithm variant.
    • I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
    • I have added the learning curves (in PNG format).
    • I have added links to the tracked experiments.
    • I have updated the overview sections at the docs and the repo
  • I have updated the tests accordingly (if applicable).

@dtch1997 dtch1997 mentioned this pull request Nov 15, 2022
20 tasks
@vercel
Copy link

vercel bot commented Nov 15, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Dec 12, 2022 at 8:55PM (UTC)

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 15, 2022

CI passed. @dtch1997 would you mind running the first round of benchmark? Don't worry about capturing videos yet because of upstream issues.

export WANDB_ENTITY=openrlbenchmark
poetry install --with mujoco
OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
    --command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track --capture-video" \
    --num-seeds 3 \
    --workers 1

@dtch1997
Copy link
Collaborator Author

Benchmark in progress: https://wandb.ai/openrlbenchmark/cleanrl?workspace=user-dtch1997

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 20, 2022

Great thank you!

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 21, 2022

Executing the following command in https://github.com/vwxyzjn/ppo-atari-metrics

python rlops.py --wandb-project-name cleanrl \
    --wandb-entity openrlbenchmark \
    --filters 'ppo_continuous_action?tag=rlops-pilot' 'ppo_continuous_action?tag=pr-320'   \
    --env-ids HalfCheetah-v4 Walker2d-v4 Hopper-v4 InvertedPendulum-v4 Humanoid-v4 Pusher-v4 \
    --output-filename compare.png --scan-history

generates

image

ppo_continuous_action ({'tag': ['rlops-pilot']}) ppo_continuous_action ({'tag': ['pr-320']})
HalfCheetah-v4 1795.55 ± 819.96 2241.90 ± 1150.61
Walker2d-v4 2983.19 ± 757.43 3577.82 ± 315.46
Hopper-v4 2279.97 ± 450.53 2111.14 ± 335.94
InvertedPendulum-v4 890.99 ± 48.93 950.98 ± 36.39
Humanoid-v4 671.07 ± 83.75 728.82 ± 62.35
Pusher-v4 -51.27 ± 9.02 -49.51 ± 3.96

@vwxyzjn
Copy link
Owner

vwxyzjn commented Nov 21, 2022

Thank you @dtch1997, would you be interested in helping run some dm_control experiments? Please pull the latest code and run

export WANDB_ENTITY=openrlbenchmark
poetry install --with dm_control,mujoco
OMP_NUM_THREADS=1 xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
    --env-ids dm_control/acrobot-swingup-v0 dm_control/acrobot-swingup_sparse-v0 dm_control/ball_in_cup-catch-v0 dm_control/cartpole-balance-v0 dm_control/cartpole-balance_sparse-v0 dm_control/cartpole-swingup-v0 dm_control/cartpole-swingup_sparse-v0 dm_control/cartpole-two_poles-v0 dm_control/cartpole-three_poles-v0 dm_control/cheetah-run-v0 dm_control/dog-stand-v0 dm_control/dog-walk-v0 dm_control/dog-trot-v0 dm_control/dog-run-v0 dm_control/dog-fetch-v0 dm_control/finger-spin-v0 dm_control/finger-turn_easy-v0 dm_control/finger-turn_hard-v0 dm_control/fish-upright-v0 dm_control/fish-swim-v0 dm_control/hopper-stand-v0 dm_control/hopper-hop-v0 dm_control/humanoid-stand-v0 dm_control/humanoid-walk-v0 dm_control/humanoid-run-v0 dm_control/humanoid-run_pure_state-v0 dm_control/humanoid_CMU-stand-v0 dm_control/humanoid_CMU-run-v0 dm_control/lqr-lqr_2_1-v0 dm_control/lqr-lqr_6_2-v0 dm_control/manipulator-bring_ball-v0 dm_control/manipulator-bring_peg-v0 dm_control/manipulator-insert_ball-v0 dm_control/manipulator-insert_peg-v0 dm_control/pendulum-swingup-v0 dm_control/point_mass-easy-v0 dm_control/point_mass-hard-v0 dm_control/quadruped-walk-v0 dm_control/quadruped-run-v0 dm_control/quadruped-escape-v0 dm_control/quadruped-fetch-v0 dm_control/reacher-easy-v0 dm_control/reacher-hard-v0 dm_control/stacker-stack_2-v0 dm_control/stacker-stack_4-v0 dm_control/swimmer-swimmer6-v0 dm_control/swimmer-swimmer15-v0 dm_control/walker-stand-v0 dm_control/walker-walk-v0 dm_control/walker-run-v0 \
    --command "poetry run python cleanrl/gymnasium_support/ppo_continuous_action.py --cuda False --track" \
    --num-seeds 3 \
    --workers 9

@nidhishs
Copy link

Hey @dtch1997, I tried running the ppo_continous_actions.py file with --num_envs=4 however done = terminated or truncated no longer works due to terminated and truncated being Numpy arrays. I believe numpy.logical_or should fix it.

@dtch1997
Copy link
Collaborator Author

@nidhishs The num_envs issue should be fixed now.
@vwxyzjn to get the code snippet to run, I had to slightly modify the pyproject.toml to enable automatic installation of the right torch version for the installed CUDA driver. Taken from here: python-poetry/poetry#4231 (comment)

@vwxyzjn
Copy link
Owner

vwxyzjn commented Dec 12, 2022

CI passed, but I had to mark the ubuntu install with continue-on-error: true # MUJOCO_GL=osmesa results in free(): invalid pointer`` because of google-deepmind/mujoco#644

Copy link
Collaborator

@dosssman dosssman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall.
Is there any alternative for video logging of the agent with Gymnasium ?

@vwxyzjn
Copy link
Owner

vwxyzjn commented Dec 13, 2022

@dosssman not right now with wandb. Pending wandb/wandb#4510.

Copy link
Owner

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks so much @dtch1997!

@vwxyzjn vwxyzjn merged commit b558b2b into master Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants