Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO reward normalization works only for default gamma #203

Closed
Tracked by #206
Howuhh opened this issue Jun 19, 2022 · 3 comments
Closed
Tracked by #206

PPO reward normalization works only for default gamma #203

Howuhh opened this issue Jun 19, 2022 · 3 comments

Comments

@Howuhh
Copy link
Contributor

Howuhh commented Jun 19, 2022

Problem Description

Current implementation of continuous action PPO uses gym.wrappers.NormalizeReward with default gamma value, for all other gamma's except default 0.99 this normalization will be not correct.

env = gym.wrappers.NormalizeReward(env)

Possible Solution

Very easy, just add gamma=args.gamma as an argument to the normalization wrapper.

@Howuhh
Copy link
Contributor Author

Howuhh commented Jun 19, 2022

If this is really a problem I will make a fix PR.

@vwxyzjn
Copy link
Owner

vwxyzjn commented Jun 19, 2022

Oh this makes sense! Thanks for raising the issue. I think ppo_procgen.py and ppg_procgen.py also use the reward normalization - feel free to submit a PR to fix them.

@Howuhh
Copy link
Contributor Author

Howuhh commented Jul 7, 2022

Fixed.

@Howuhh Howuhh closed this as completed Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants