PPO reward normalization works only for default gamma #203

Howuhh · 2022-06-19T17:04:33Z

Problem Description

Current implementation of continuous action PPO uses gym.wrappers.NormalizeReward with default gamma value, for all other gamma's except default 0.99 this normalization will be not correct.

cleanrl/cleanrl/ppo_continuous_action.py

Line 92 in 94a685d

env = gym.wrappers.NormalizeReward(env)

Possible Solution

Very easy, just add gamma=args.gamma as an argument to the normalization wrapper.

The text was updated successfully, but these errors were encountered:

Howuhh · 2022-06-19T17:10:54Z

If this is really a problem I will make a fix PR.

vwxyzjn · 2022-06-19T20:08:47Z

Oh this makes sense! Thanks for raising the issue. I think ppo_procgen.py and ppg_procgen.py also use the reward normalization - feel free to submit a PR to fix them.

Howuhh · 2022-07-07T13:29:30Z

Fixed.

vwxyzjn mentioned this issue Jun 20, 2022

PPO improvements #206

Closed

5 tasks

Howuhh mentioned this issue Jun 20, 2022

added gamma to reward normalization wrappers #209

Merged

12 tasks

Howuhh closed this as completed Jul 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO reward normalization works only for default gamma #203

PPO reward normalization works only for default gamma #203

Howuhh commented Jun 19, 2022 •

edited

Loading

Howuhh commented Jun 19, 2022

vwxyzjn commented Jun 19, 2022

Howuhh commented Jul 7, 2022

PPO reward normalization works only for default gamma #203

PPO reward normalization works only for default gamma #203

Comments

Howuhh commented Jun 19, 2022 • edited Loading

Problem Description

Possible Solution

Howuhh commented Jun 19, 2022

vwxyzjn commented Jun 19, 2022

Howuhh commented Jul 7, 2022

Howuhh commented Jun 19, 2022 •

edited

Loading