Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] [feature request] support for Dict and Tuple spaces #133

Closed
AloshkaD opened this issue Dec 15, 2018 · 37 comments
Closed

[question] [feature request] support for Dict and Tuple spaces #133

AloshkaD opened this issue Dec 15, 2018 · 37 comments
Labels
enhancement New feature or request question Further information is requested v3 Discussion about V3

Comments

@AloshkaD
Copy link

AloshkaD commented Dec 15, 2018

I want to train using two images from different cameras and an array of 1d data from a sensor. I'm passing these input as my env state. Obviously I need a cnn that can take those inputs, concatenate, and train on them. My question is how to pass these input to such a custom cnn in polocies.py. Also, I tried to pass two images and apparently dummy_vec_env.py had trouble with that.
obs = env.reset() File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 57, in reset self._save_obs(env_idx, obs) File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 75, in _save_obs self.buf_obs[key][env_idx] = obs ValueError: cannot copy sequence with size 2 to array axis with dimension 80

I appreciate any thoughts or examples.

@araffin
Copy link
Collaborator

araffin commented Dec 16, 2018

Hello,

Could you please provide a minimal code to reproduce the error?

@AloshkaD
Copy link
Author

AloshkaD commented Dec 16, 2018

@araffin So for simplicity, say I need to pass two rgb images (observations from two cameras onboard a robot) of the size (80,160,4) as states like this

`class MyCustomEnv(gym.Env):

def __init__(self):

    self.observation_space = spaces.Box(low=0, high=255, shape=(80,160,4), dtype=np.float32)

    self.state =(np.zeros((80, 160,4), dtype=np.uint8),np.zeros((80, 160,4), dtype=np.uint8))

    .
    .
    .
def step(self, action):
    .
    .
    .
    self.state = self.rgbimage_1,self.rgbimage_2 
    return self.state, reward, done, info
def reset(self):
    .
    .
    .
    self.state = self.rgbimage_1,self.rgbimage_2 
    return self.state

`
I hope this is good enough. I also suspect my definition of the observation_space might not be correct but I tried different methods to define an observation_space for two images and nothing worked. I saw that you are a contributor here and I hope you would be able to help with defining the ob_space too
for the record, I tried to build an observation space like this

`self.nested_observation_space = spaces.Dict({

    'sensors':  spaces.Dict({

        #'position': spaces.Box(low=-100, high=100, shape=(3,)),

        #'velocity': spaces.Box(low=-1, high=1, shape=(3,)),

        'front_cam': spaces.Tuple((

            spaces.Box(low=0, high=255, shape=(80, 160, 4)),

            spaces.Box(low=0, high=255, shape=(80, 160, 4))

        )),
        }) 
        })

`
but that didn't work either and returned the error

env = DummyVecEnv([lambda: env]) # The algorithms require a vectorized environment to run File "d:\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 31, in __init__ shapes[key] = box.shape
for simplicity I passed this and still got the same error
self.nested_observation_space =spaces.Tuple(( spaces.Box(low=0, high=255, shape=(80, 160, 4)), spaces.Box(low=0, high=255, shape=(80, 160, 4)) ))

I can send you the complete class if you like.
Thanks

@AloshkaD
Copy link
Author

AloshkaD commented Dec 17, 2018

The problem appears to be with vectorizing the env.. I get
"d:stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 35, in __init__ self.buf_obs = {k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys} File "d:\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 35, in <dictcomp> self.buf_obs = {k: np.zeros((self.num_envs,) + tuple(shapes[k]), dtype=dtypes[k]) for k in self.keys} TypeError: 'NoneType' object is not iterable

for defining the state like this
self.observation_space =spaces.Tuple(( spaces.Box(low=0, high=255, shape=(80, 160, 4), dtype=np.uint8), spaces.Box(low=0, high=255, shape=(80, 160, 4), dtype=np.uint8) ))

@araffin
Copy link
Collaborator

araffin commented Dec 17, 2018

Hello,
Dict and Tuple spaces are not supported for observations spaces. Did you try concatenating the images along the channel axis?

@AloshkaD
Copy link
Author

AloshkaD commented Dec 18, 2018

I could concatenate the images and then separate them when fed to the cnn. I could also pad the signal with zeros and concatenate it as a 2x2 channel. I'm worried about the scalability of this approach.

@pulver22
Copy link

Hello,
Dict and Tuple spaces are not supported for observations spaces. Did you try concatenating the images along the channel axis?

Why aren't they supported? I also would like to pass image + scalars an input to the policy at the current stage this is not possible. I don't know if it's more convenient to write a code for this or just add a vector of scalar at the end of the image and then separate it later.

@hill-a
Copy link
Owner

hill-a commented Dec 19, 2018

hey,

@pulver22 Well Tuple space could be supported with some effort (IIRC you can feed tuples into the feed_dict with a tf.concat of placeholders).

However Dict would require quite a bit of reworking for it to be compatible with all the models, as each placeholder for each tensor would be called by name, and not by sequential order.

EDIT: if anyone can see a quick hack that could work in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/input.py without changing anything else, it would be awsome to hear from you.

EDIT2: Just tried the Tuple with tf.concat and tf.stack, and it doesn't seem to want to play nice. Which makes sense when you think about concatenating an image of 90x128 integers with 4 floating point value, the code would need to flatten the input, and make it all floating point numbers; and that would only work with MLP policies.

@AloshkaD
Copy link
Author

AloshkaD commented Dec 19, 2018

@hill-a there seem to be a hack proposed by @Atcold here but it does not seem to generalize to all envs
Atcold@dbc329f

@AloshkaD AloshkaD changed the title [question] how to go about an env with two input images and an array [question] [feature request] support for Dict and Tuple spaces Dec 19, 2018
@araffin araffin added enhancement New feature or request question Further information is requested help wanted Help from contributors is needed labels Dec 20, 2018
@hill-a
Copy link
Owner

hill-a commented Dec 21, 2018

@AloshkaD That seems to be more of a alterations of the models, which is exactly what I would like to avoid doing; as it might generate more unforeseen issues and bugs when changing all the models in such a way.

I was hoping to be able to simply change the input parsing code (stable_baselines/common/input.py) (that almost all the models use).

However, if this unlikely to be possible, then a redesign of the return of the input parsing code might be a more viable solution to this problem.

@araffin araffin removed the help wanted Help from contributors is needed label Dec 21, 2018
@AloshkaD
Copy link
Author

Agreed! thank you @hill-a and @araffin.

@Atcold
Copy link

Atcold commented Jan 3, 2019

Sorry, I've been away these past two weeks...
Thanks @AloshkaD for the ping, btw.

What you found is a working hack.
Currently, in order to avoid headaches while pulling the latest master, I've resorted to reshaping all my observations as 1D tensors (long vectors) and concatenate them all. Later on, in my neural net, I take apart the observation and send the different parts to different encoders. See traffic_models.py.

@AloshkaD
Copy link
Author

AloshkaD commented Jan 3, 2019

@Atcold thank you. Similarly, concatenating images on the channel access worked for me but it caused many issues with tensorboard logging. The logging expect an image that is 4 channels at most and by passing 6 channels it fails. Even if I initialize the empty tensor to the right shape, the incoming images have 6 channels. I'm going to dedicate more time to fix this issue in the weekend.

@Atcold
Copy link

Atcold commented Jan 16, 2019

@AloshkaD, you can always reshape your data before logging it.
From the TensorFlow documentation we have that:

The summary has up to max_outputs summary values containing images. The images are built from tensor which must be 4-D with shape [batch_size, height, width, channels] and where channels can be:

  • 1: tensor is interpreted as Grayscale.
  • 3: tensor is interpreted as RGB.
  • 4: tensor is interpreted as RGBA.

You can pass channel=1, and have width=6 * original_width. So, a simple reshape should be sufficient.
Please, let me know if you have any other issue.

@srivatsankrishnan
Copy link

srivatsankrishnan commented Mar 29, 2019

Hi ,
Is the Dict space working with stable-baselines? I am confused since the documentation doesn't mention it. It seems that this PR(#207) doesn't work. I see the code changes in utils.py but the error I am getting is in stable-baselines/common/input.py. I don't see any code that corresponds to "Dict" workspace in inputs.py.

My requirement is also similar to @AloshkaD, where I want to process multiple images and measurement vectors. I am open to try concatenating the images. Did you pad zeros to 1-D vector to concatenate with the images. Do you have reference code somewhere that I can use as a starting point?

@araffin
Copy link
Collaborator

araffin commented Mar 29, 2019

Is the Dict space working with stable-baselines?

Hi, it is mentioned here in the doc:
"Non-array spaces such as Dict or Tuple are not currently supported by any algorithm."

The PR you are referring only adds the support for the VecEnvs, not the algorithms.

@srivatsankrishnan
Copy link

Thanks for your quick response and clarification. I was thinking that the feature is supported but the document is out of date. So the workaround it to basically do what @AloshkaD by concatenating the images across the channel axis.

@bschreck
Copy link

@araffin Has anyone proposed a PR to implement Tuple/Dict/etc for the action space? I came across this in a project I'm working on- I need to specify both discrete values (which internally in the Env represent indexes into an array) and continuous (specifying specific new amounts to add to the array, to simplify a bit). I'm open to working on a PR if none is in the works.

@bschreck
Copy link

Experimented a bit with a MultiMixedProbabilityDistribution: https://github.com/hill-a/stable-baselines/compare/master...bschreck:add-multi-mixed-proba?expand=1

Not tested at all yet

@araffin
Copy link
Collaborator

araffin commented Apr 15, 2019

Hello,
for now, nobody is working on that.
However, there are two important things that needs to be taken into account when creating a PR for that feature:

  • it should not break previous versions
  • the changes should be as minimal as possible (so the code stays readable)

@araffin
Copy link
Collaborator

araffin commented Apr 30, 2019

Small update on that topic, dict obs space will be supported for HER (see #273 ), when using gym.GoalEnv.
But it requires for now all keys to have the same type.

@AloshkaD
Copy link
Author

Thanks @gautams3. As @araffin mentioned, this may not work for my case where I have images and 1d sensor data. I'm still using the workaround in which I convert my state observations into an image with multiple channels(3 for rgb, 1 depth, and one for each sensor) and recover the signal data before feeding them to the network. I'm using PPO2

@nkleber1
Copy link

Hello @AloshkaD,
I think your workaround is interesting. Could you please explain how you recover the signal data before feeding them to the network.
Thanks in advance.

@Miffyli
Copy link
Collaborator

Miffyli commented Sep 21, 2019

@nkleber1

You can use a custom policy for this. In case of CNN policy you can replace the cnn_extractor with a head of your liking where you split the augmented image into actual image and direct features (e.g. 1d sensor data). Like so:

num_direct_features = NUMBER_OF_DIRECT_FEATURES

def augmented_nature_cnn(scaled_images, **kwargs):
        """
        Copied from stable_baselines policies.py.
        This is nature CNN head where last channel of the image contains
        direct features on the last channel.

        :param scaled_images: (TensorFlow Tensor) Image input placeholder
        :param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
        :return: (TensorFlow Tensor) The CNN output layer
        """
        activ = tf.nn.relu

        # Take last channel as direct features
        other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
        # Take known amount of direct features, rest are padding zeros
        other_features = other_features[:, :num_direct_features]

        scaled_images = scaled_images[..., :-1]

        layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
        layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
        layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
        layer_3 = conv_to_fc(layer_3)

        img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))

        concat = tf.concat((img_output, other_features), axis=1)

        return concat

policy_kwargs = {
        "cnn_extractor": augmented_nature_cnn(num_features)
}

agent = PPO2(policy_kwargs=policy_kwargs, ...)

@araffin
Copy link
Collaborator

araffin commented Sep 21, 2019

Additional remark: you should be careful regarding the automatic normalization, cf discussion #456

@radusl
Copy link

radusl commented Dec 4, 2019

For most of the Atari games, the observation space is quite simple, you either have a Box or Discrete. The problem is that when working with real world environments or business cases, some have more complex observation spaces: single/multiple Box or a combination of Box and Discrete. Hence support for Tuple would be very nice.

The custom environment i try to implement with stable-baselines has a Tuple observation space of 4 different time series represented as 'Box', each with different shapes . After reading the comments in this section, i understood that one can merge all of them for the input and then split them apart in the custom policy. Can somebody give an example of how this might be achieved?

@Miffyli
Copy link
Collaborator

Miffyli commented Dec 4, 2019

@radusl

You can append the "direct features" (non-image) features on e.g. last channel of the image, and pad it with zeros to match the other dimensions. Then you can use a cnn_extractor like one returned by this function to process the actual image with convolutions and then append it with direct features:

def create_augmented_nature_cnn(num_direct_features):
    """
    Create and return a function for augmented_nature_cnn
    used in stable-baselines.

    num_direct_features tells how many direct features there
    will be in the image.
    """

    def augmented_nature_cnn(scaled_images, **kwargs):
        """
        Copied from stable_baselines policies.py.
        This is nature CNN head where last channel of the image contains
        direct features.

        :param scaled_images: (TensorFlow Tensor) Image input placeholder
        :param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
        :return: (TensorFlow Tensor) The CNN output layer
        """
        activ = tf.nn.relu

        # Take last channel as direct features
        other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
        # Take known amount of direct features, rest are padding zeros
        other_features = other_features[:, :num_direct_features]

        scaled_images = scaled_images[..., :-1]

        layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
        layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
        layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
        layer_3 = conv_to_fc(layer_3)

        # Append direct features to the final output of extractor
        img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))

        concat = tf.concat((img_output, other_features), axis=1)

        return concat

    return augmented_nature_cnn

@pirobot
Copy link

pirobot commented Dec 10, 2019

I am very interested in getting mixed dictionary input spaces officially supported in stable-baselines and would be willing to pay for someone to do the work since I doubt I have the skills to do it myself. If anyone here has the skills or knows of a pay-for service where I might post the project, please let me know.

@nicofirst1
Copy link

Is there any update on this?
I am trying to use a mixed dictionary space (Discrete + MultiDiscrete) as action space but rllib yields:
NotImplementedError: Dict action spaces are not supported, consider using gym.spaces.Tuple instead

@Miffyli
Copy link
Collaborator

Miffyli commented Feb 15, 2020

@nicofirst1
No updates yet. We are focusing on transitioning on the new backend first (v3.0), after which this will be one of the high-priority updates for v3.1.

@nicofirst1
Copy link

Any clue on how long will it take?

@Miffyli
Copy link
Collaborator

Miffyli commented Feb 15, 2020

I can not give any exact times but at least a month, I would say.

Regarding your rllibs problem: You could modify your space to be a Tuple, no? Just make sure you provide observations in same order on each step. Please do not further this discussion here, but just food for thought.

@araffin
Copy link
Collaborator

araffin commented Feb 15, 2020

Regarding your problem, it seems to me that Discrete is a sub ensemble of MultiDiscrete, so you could use only MultiDiscrete space in your case.
Btw, we plan support for observation Dict first, action space Dict is an open question of research.

@araffin
Copy link
Collaborator

araffin commented May 11, 2021

Closing this as DLR-RM/stable-baselines3#243 in now merged with SB3 master =)

@araffin araffin closed this as completed May 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested v3 Discussion about V3
Projects
None yet
Development

No branches or pull requests