-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added gamma to reward normalization wrappers #209
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
So here is the tricky part - the original implementation actually uses This would cause a performance change unfortunately. There are two ways to go forward
@Howuhh what do you think we should do? |
@vwxyzjn To be honest, I think this is a bug in original code, not a feature, so it will be more accurate to rerun for correct results. However, procgen is image based env and for now I don't have resources to train on images. |
Ok, no worries. I will take care from here. @Dipamc77 I don't have the GPU memory to run the PPG experiments. Would you mind running them with this PR? I can take care of the ppo procgen experiments. Lines 3 to 8 in 6387191
|
Running the PPO experiments now. Also tried a fun thing by adding a wandb tag like
which produces runs like @dosssman I think this tagging system could somehow help us phase out past openrlbenchmark experiments without deleting them. I will have to think about the workflow a bit more. |
The bigfish performance degradation could easily be due to a random seed. |
@vwxyzjn Seems okay to me. Thanks for redoing the experiments btw. |
Description
Fixes incorrect gamma in reward normalization wrapper for non-default gamma's. See #203.
Types of changes
Checklist:
pre-commit run --all-files
passes (required).If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).