Add timeout handling for on-policy algorithms #658

araffin · 2021-11-10T09:39:43Z

Description

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

closes #633

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)
I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

…elines3 into feat/timeout-on-policy

araffin · 2021-11-12T16:04:10Z

@zhihanyang2022 could you have a look at that one?

zhihanyang2022 · 2021-11-14T08:11:26Z

@araffin Sorry, which part do you want me to take a look? It seems like the changes are passing tests correctly?

araffin · 2021-11-14T10:47:13Z

@araffin Sorry, which part do you want me to take a look? It seems like the changes are passing tests correctly?

could you review the code? (both logic and style/naming)

Miffyli

LGTM with few comments :). I would like second opinion if @zhihanyang2022 has time to look over this (about ~20 lines of relevant code changes).

tests/test_gae.py

Co-authored-by: Anssi <[email protected]>

zhihanyang2022

I'm still trying to understand how testing works here, but I've checked the main changes made to on_policy_algorithm.py and test_gae.py and I think they are correct.

Adding gamma * terminal_value to the final reward is logically identical to what's done in OpenAI SpinUp:

https://github.com/openai/spinningup/blob/038665d62d569055401d91856abb287263096178/spinup/algos/pytorch/ppo/ppo.py#L59

araffin · 2021-11-16T08:46:28Z

I'm still trying to understand how testing works here,

to give a bit more context, I created an environment where I know in advance the true value of each state (because the reward does not depend on the policy) but that is different if you look at it as an infinite or finite horizon problem.
The env is composed of 4 states (0, 1, 2, 3) that are repeated over time, depending on the max number of steps.
For a max episode length of 8, the agent will go into states 0, 1, 2, 3, 0, 1, 2, 3 and receive a reward of one for each state.
Because the remaining time is not included, we break Markov assumption and the value of the first state V(s=0) is not really well defined, it is something between 8 (discounted sum of 8 steps) and 3... (discounted sum of 4 steps).

On the other hand, if we treat the problem as infinite horizon, the true value is a geometric series 1 + 1**gamma + 1**gamma^2 + ... = 1 / (1 - gamma) (we get a reward of one at each step and the value is the discounted sum of it on an infinite horizon)

araffin added 10 commits November 10, 2021 10:38

Add timeout handling for on-policy algorithms

938a707

Fixes

80d4da2

Fix infinite loop in eval

aa5856e

Skip type check for python 3.9

2d3d48a

Fix for discrete obs + add docstring

c7d9b9a

Fix A2C test

7973f12

Merge branch 'master' into feat/timeout-on-policy

0869eee

Removed unused helper

e25e340

Merge branch 'feat/timeout-on-policy' of github.com:DLR-RM/stable-bas…

f4c0989

…elines3 into feat/timeout-on-policy

Add test for infinite horizon

eeb89c6

araffin marked this pull request as ready for review November 12, 2021 16:03

araffin requested review from Miffyli, AdamGleave, ernestum and hill-a November 12, 2021 16:03

typed ast should be fixed

18ccd21

Miffyli reviewed Nov 15, 2021

View reviewed changes

tests/test_gae.py Outdated Show resolved Hide resolved

tests/test_gae.py Show resolved Hide resolved

Apply suggestions from code review

3dd16e9

Co-authored-by: Anssi <[email protected]>

zhihanyang2022 approved these changes Nov 16, 2021

View reviewed changes

araffin merged commit d228364 into master Nov 16, 2021

araffin deleted the feat/timeout-on-policy branch November 16, 2021 16:19

dtch1997 mentioned this pull request Nov 26, 2021

PPO does not correctly calculate reward on timeout mcx-lab/rl-baselines3-zoo#68

Open

vwxyzjn mentioned this pull request Jun 10, 2022

PPO timeout proper handling vwxyzjn/cleanrl#198

Open

araffin mentioned this pull request Oct 2, 2022

Add Gym 0.26 support #780

Closed

19 tasks

shuishida mentioned this pull request Mar 2, 2023

[Feature Request] Fixing TimeLimit Handling for On-Policy algorithm #1355

Closed

1 task

sdpkjc mentioned this pull request Apr 23, 2024

Enhancing Termination and Truncation Handling in CleanRL's PPO Algorithm vwxyzjn/cleanrl#448

Open

20 tasks

sdpkjc mentioned this pull request Aug 2, 2024

Possible inconsistencies with the PPO implementation vwxyzjn/cleanrl#477

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeout handling for on-policy algorithms #658

Add timeout handling for on-policy algorithms #658

araffin commented Nov 10, 2021 •

edited

Loading

araffin commented Nov 12, 2021

zhihanyang2022 commented Nov 14, 2021

araffin commented Nov 14, 2021

Miffyli left a comment

zhihanyang2022 left a comment •

edited

Loading

araffin commented Nov 16, 2021 •

edited

Loading

Add timeout handling for on-policy algorithms #658

Add timeout handling for on-policy algorithms #658

Conversation

araffin commented Nov 10, 2021 • edited Loading

Description

Motivation and Context

Types of changes

Checklist:

araffin commented Nov 12, 2021

zhihanyang2022 commented Nov 14, 2021

araffin commented Nov 14, 2021

Miffyli left a comment

Choose a reason for hiding this comment

zhihanyang2022 left a comment • edited Loading

Choose a reason for hiding this comment

araffin commented Nov 16, 2021 • edited Loading

araffin commented Nov 10, 2021 •

edited

Loading

zhihanyang2022 left a comment •

edited

Loading

araffin commented Nov 16, 2021 •

edited

Loading