You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current implementation of continuous action PPO uses gym.wrappers.NormalizeReward with default gamma value, for all other gamma's except default 0.99 this normalization will be not correct.
Oh this makes sense! Thanks for raising the issue. I think ppo_procgen.py and ppg_procgen.py also use the reward normalization - feel free to submit a PR to fix them.
Problem Description
Current implementation of continuous action PPO uses
gym.wrappers.NormalizeReward
with default gamma value, for all other gamma's except default0.99
this normalization will be not correct.cleanrl/cleanrl/ppo_continuous_action.py
Line 92 in 94a685d
Possible Solution
Very easy, just add
gamma=args.gamma
as an argument to the normalization wrapper.The text was updated successfully, but these errors were encountered: