-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not really an issue, more a question #10
Comments
Just to be more precise, I would like to train your agent on 1000 random environment and test it on 1000 other environment to get the generalisation percentage on these test environment ... not sure how I can do that with the code provided ... thanks |
Hi, thanks for your interest! |
Sure, so if I understood it well, you make iterations where you train on 3 environments randomly chosen and then test on another one also randomly chosen ? right ? the results in computed every 30 test as an average of reward over these 30 test environments ... |
For MiniGrid we're using the usual PPO setup (see here for hyperparameters:
Not sure if that helps, please let me know if not - I feel like we might be talking past each other :). |
Hi,
I want to reuse your experiment on MiniGrid as a benchmark to my paper on RL generalisation ... it fits nicely, but I am not clear how to replicate the experiment to generate the orange line on your paper, can you provide some insight ?
Are your running the training on 2 000 000 environments to generate the chart ?
Thanks a lot in advance.
The text was updated successfully, but these errors were encountered: