Fully observability #26

d3sm0 · 2018-10-19T23:03:39Z

Hey,
I haven't thought about two issues with the FullyObsWrapper:

using the wrapper as it is, it renders an image of 800 x 800 x 3 which is quite heavy to manage
one must call the env.render('human')
The combination of the two makes the env very slow.

Any suggestion on how I can improve on it?

Thank you !

The text was updated successfully, but these errors were encountered:

maximecb · 2018-10-19T23:46:01Z

Hmm, first thing is, you should be calling env.render('rgb_array') (which will return a numpy array), not env.render('human'), and this should be done within the wrapper's observation method.

For speed, the simplest thing to do would be to render a smaller version of the grid. But the most efficient way would be to encode the grid directly as a numpy array with 3 values per cell as I did for partial observability. See the encode method of the Grid class: https://github.com/maximecb/gym-minigrid/blob/master/gym_minigrid/minigrid.py#L508

The only downside of this is that it doesn't currently encode the agent position.

d3sm0 · 2018-10-20T08:35:41Z

Thank you for the quick reply. Something like this should the trick then:

    def observation(self, obs):
        full_grid = self.env.grid.encode()
        full_grid[self.env.agent_pos[0]][self.env.agent_pos[1]] = self.env.agent_dir
        return full_grid

maximecb · 2018-10-20T19:55:56Z

Yes, like that, except you'd want to also encode that the agent is at that position, not just the agent direction. Something like:

full_grid[x, y, 0] = 255
full_grid[x, y, 1] = self.env.agent_dir
full_grid[x, y, 2] = 0

You would also want to change the observation_space to have the correct shape.

Driesssens · 2018-11-06T11:14:09Z

For anyone using this: I recommend encoding the agent as a much lower number that is closer to how the other objects are encoded.

While training on the Unlock-environment converged in ~12 minutes with the regular, egocentric view, training with the FullyObsWrapper never converged. At first I thought the egocentric view gives a translation invariance that the FullyObsWrapper doesn't have, so I tried to compensate and make the environment much simpler by removing colors, giving a small reward when picking up the key or reducing the action space such that pickup and toggle were reduced to 'interact' and drop was removed. Despite all of this, the model just wouldn't learn to go to the door after picking up the key.

Finally I changed the agent encoding from 255 to 9, and now training with the fully observable view converges as fast as with the egocentric view. Possibly the high value is too dominant in the convnet's processing.

PS. The current FullyObsWrapper also doesn't encode which item is being carried. In the egocentric view, this is encoded by showing the item as if it is at the agent's position, but the FullyObsWrapper overwrites this grid position to encode the agent. If color is not important, you can encode the carried object's type at the agent's position in the 3rd layer. If color is important, you will need to add a 4th layer to also include the carried object's color.

d3sm0 mentioned this issue Oct 21, 2018

Update FullyObsWrapper #27

Merged

d3sm0 closed this as completed Oct 25, 2018

This was referenced Dec 10, 2018

List of potential improvements #38

Closed

How to use Class FullyObsWrapper in 'wrappers.py' #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fully observability #26

Fully observability #26

d3sm0 commented Oct 19, 2018

maximecb commented Oct 19, 2018 •

edited

Loading

d3sm0 commented Oct 20, 2018

maximecb commented Oct 20, 2018

Driesssens commented Nov 6, 2018

Fully observability #26

Fully observability #26

Comments

d3sm0 commented Oct 19, 2018

maximecb commented Oct 19, 2018 • edited Loading

d3sm0 commented Oct 20, 2018

maximecb commented Oct 20, 2018

Driesssens commented Nov 6, 2018

maximecb commented Oct 19, 2018 •

edited

Loading