-
-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List of potential improvements #38
Comments
How about stochastic transitions and maybe stochastic rewards? And maybe a better test suite? |
I think stochastic transitions and rewards are already possible using the random number generation methods. For a better test suite, I'm open to it, but would need more concrete suggestions. |
I'll think more about it. I never wrote one but I'll be happy to take care of it. One colleague of mine was asking about the possibility of creating envs from a text file. It has been done for a long time before gym, and maybe it can be useful, for specific research. I would also like to have a benchmark or leaderboard if you want over a subset of all the envs like the top 10 that ore more difficult. |
It would make sense to open issues for all 3 of these things to discuss what's involved. Like, for tests, what tests make sense to have. For creating envs from text files, the format needs to be discussed. For the leaderboard, I don't know exactly how that works. One issue is that the maximum reward you can get is 1.0. Maybe if it was averaged over 100 runs? |
Maybe let's gather a list of potential improvements /changes and desiderata for the 2.0 and with this we can split the the issues and start working? At least that's what I understood what was this thread about. If not, sorry i'll open an issue for each of those. Beside there are two more things that i was wondering, could do some rendering of some stats like current step, reward, action, and maybe value of the state. Especially the latter would be super useful in debugging the methods |
Some of these improvements we can integrate in the current MiniGrid because they are not at risk of breaking existing code. We could already improve the tests, for instance. Creating envs from a text file too, possibly. The things that are the most risky are changing the APIs that people call when interacting with MiniGrid and changing the observation format. We could definitely render current state, reward and action. Value of the state I'm not sure because that's dependent on the RL implementation you're using. |
Hello, just throwing in my 2c since I've taken interest in this project recently:
While OpenAI Gym is technically designed for single-agent environments in mind, there is nothing stopping people to use the same library / interface to handle multi-agent environments. E.g.
+1
I don't think this is necessary; Yes, one should obviously use one-hot-encodings (categorical indices are not features!) but these can happen outside of the environment, e.g. I do it at the input of whatever model you want to use them with (e.g. with
Absolutely agree that they should be the same object type, what I'm not sure about is, why isn't the state of the object completely contained in the third channel? e.g. for doors, 0 = open, 1 = closed, 2 = locked. And this could be used to contain the state of other types of objects too.. This leaves some open questions, e.g. would each object have their own "state" encoding? For now only door have this type of state, but I don't see a reason why this same channel shouldn't be used more broadly. |
For the leaderboard I'm thinking more of a page with final score for the top 10 hardest environment and the learning curve of the top 3 algorithms, PPO, A2C, DQN. It should serve as a reference if someone wants to benchmark its' algorithm on this environment against the basic version. We could pick the OpenAi/Baselines implementation and just run MiniGrid or implement DQN in the linked repo in pytorch. About tests if we are going to break the env in the upcoming weeks, I would rather work on it in the 2.0. About text files I'm going to open an issue :) |
Sounds fair, how about a debug mode with an optional string or variable to render? |
@abaisero That's a fair point.
We do have a
@d3sm0 If you have the time to do this kind of study it would be awesome.
Would be happy to have sample code using OpenAI baselines in this repo, to serve as a reference, though we would then ideally want to maintain this code afterwards.
After thinking about this some more, I think that we may actually want to make the changes in this repo. The reason being, no code is ever perfect. We will likely need to make yet more changes in the future. Creating a new repo would just be kicking the can down the road, and people might be confused if there are multiple repos named MiniGrid. We will try to be careful and not break too many things at once, avoid unnecessary changes to outward-facing APIs. |
I do agree that the one-hot encoding should be left to the agent because is an architectural choice, but as we have seen with the
Indeed I do, cause I'm using this for my own research and for the repr challenge. But the algorithms are my own which is not ideal. Baselines changed a lot and it will change. Eventually we will just need to update the results from time to time, without maintaining the codebase, and run script that collects the results and render an html file with it.
Fair, maybe we can just branch this, so who is using it will be stable on the master and iterate in it, starting from the proposed core changes. |
I like to make things as easy and as optimal out of the box for newcomers. I would like to benchmark the training performance of a one-hot type embedding vs the class indices we have now, and decide based on that. If the one-hot scheme trains noticeably faster, we might want to make it the default.
If you write such a script I will merge it. What is the repr challenge? |
Fair. I think can be related to #39 and #26. I'm not confident that the neural networks learn the same pattern if it's one-hot or real-valued. Eventually they should.
This is the reproducibility challenge (repr) https://www.cs.mcgill.ca/~jpineau/ICLR2019-ReproducibilityChallenge.html Alright i'll make the script. I'll open an issue for this, since it's not clear what the script should do. |
@d3sm0 if you'd like to benchmark the one-hot encoding, I quickly hacked it together in the |
I've done some benchmarks of my own on the BabyAI GoTo and PutNextLocal environments and the results are inconclusive for the onehot change. |
@maximecb |
@d3sm0 I mean that on some levels the success rate was very slightly higher, on others it was worse. Success rate as measured as the percentage of episodes ending with a positive reward. |
@maximecb I see. If you change the representation of the state, in principle the function approximator should behave differently, thus maybe is worth looking at the moments of the gradient of the neural network, the smoothness of the losses that you are considering and the cumulative reward over time during training. I might run some experiments too over the weekend. |
I've pushed a commit that unifies |
Hello! I was wondering if it is possible to assign a custom, arbitrary, shape for the grid world. I'm not sure if that is available already (apologies if it is!) but it would make for a nice addition. I plant to use RL for assessing occupant satisfaction and custom shapes allow for different layouts to be tested and would make minigrid a nice tool for my use case, although not sure how useful it'd be for the rest. Kind regards, |
@TheodoreGalanos Hi Theodore. Right now you can have square shapes of any size. I assume you mean support for environments that are rectangular, but not necessarily squares? I will try to get that working today. |
@TheodoreGalanos it's done. See the RedBlueDoors environment for an example: https://github.com/maximecb/gym-minigrid/blob/master/gym_minigrid/envs/redbluedoors.py |
I'm opening an issue to keep track of potential improvements to a MiniGrid v2.0. These are things I would do differently if I had to do it again. I'm not sure if these will get merged in the current version of MiniGrid because there is a big risk of breaking existing code, and this is particularly sensitive since the people using this code are using it for research purposes, where replicability is important. What may eventually happen is that I create a
MiniGrid2
repository.Separate out the gridworld from the OpenAI Gym interface. As pointed out in Multi-agent extension? #37, it would make sense to have a gridworld class separate from the OpenAI Gym environment class. This would help us support multi-agent type of setups. It might also be slightly cleaner. We are already part of the way there with the current
Grid
class.The agent should be treated like any other gridworld object. This again goes in line with multi-agent support. I think it would also be cleaner.
Observations should be encoded using a one-hot scheme rather than integers corresponding to each object type and color. The observations may not actually be any bigger in terms of bytes taken if we use a numpy array of bools (bits). This would likely be easier for a neural network to decode.(Won't be done, results inconclusive)By default, observations should be tensors, so that the code works out of the box with most RL frameworks. Mission strings should be provided as part of the
info
dict. The unfortunate truth is that OpenAI Gym has very poor support for any other kind of observation encoding.Doors and locked doors should probably have the same object type. We could take advantage of a one-hot scheme here and have a "door" bit as well as a "locked" flag bit. Possibly, each object should have a method to produce its own encoding in terms of bits.(Done)Some people have had difficulty installing PyQT. It might be nice to explore alternative rendering options. We could potentially generate graphics directly using 2D NumPy tensors, but we would still need some library to open display windows. I don't know what is the simplest Python option for that. The goal is for this package to work everywhere without any issues as much as possible.
Render current step, max steps, reward obtained in
env.render('human')
.Rename
max_steps
tomax_episode_steps
, which is more in keeping with OpenAI Gym conventions.Other suggestions welcome.
The text was updated successfully, but these errors were encountered: