Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cartpole system: reward is always 1 #1682

Closed
yerzhik opened this issue Sep 18, 2019 · 2 comments
Closed

cartpole system: reward is always 1 #1682

yerzhik opened this issue Sep 18, 2019 · 2 comments

Comments

@yerzhik
Copy link

yerzhik commented Sep 18, 2019

I have a question, shouldn't the reward depend on how good the action was?
correct me if I'm wrong. What I see in the code is that always it gives for any action reward equal to 1.0.

@JNC96
Copy link

JNC96 commented Sep 18, 2019

I believe it rewards 1.0 for every step that the environment doesn't end, and accumulates reward depending on how long it doesn't fail the current episode.

As per the wiki:

Episode Termination

  • Pole Angle is more than ±12°
  • Cart Position is more than ±2.4 (center of the cart reaches the edge of the display)
  • Episode length is greater than 200

Solved Requirements

  • Considered solved when the average reward is greater than or equal to 195.0 over 100 consecutive trials.

So the reward here is basically amount of steps that the episode runs for, and rewarding that fact.

@yerzhik
Copy link
Author

yerzhik commented Sep 24, 2019

Thank you! Now I get it. Should have read the wiki in the first place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants