-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[question] [feature request] support for Dict and Tuple spaces #133
Comments
Hello, Could you please provide a minimal code to reproduce the error? |
@araffin So for simplicity, say I need to pass two rgb images (observations from two cameras onboard a robot) of the size (80,160,4) as states like this `class MyCustomEnv(gym.Env):
` `self.nested_observation_space = spaces.Dict({
`
I can send you the complete class if you like. |
The problem appears to be with vectorizing the env.. I get for defining the state like this |
Hello, |
I could concatenate the images and then separate them when fed to the cnn. I could also pad the signal with zeros and concatenate it as a 2x2 channel. I'm worried about the scalability of this approach. |
Why aren't they supported? I also would like to pass image + scalars an input to the policy at the current stage this is not possible. I don't know if it's more convenient to write a code for this or just add a vector of scalar at the end of the image and then separate it later. |
hey,
However Dict would require quite a bit of reworking for it to be compatible with all the models, as each placeholder for each tensor would be called by name, and not by sequential order. EDIT: if anyone can see a quick hack that could work in https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/input.py without changing anything else, it would be awsome to hear from you. EDIT2: Just tried the Tuple with tf.concat and tf.stack, and it doesn't seem to want to play nice. Which makes sense when you think about concatenating an image of 90x128 integers with 4 floating point value, the code would need to flatten the input, and make it all floating point numbers; and that would only work with MLP policies. |
@hill-a there seem to be a hack proposed by @Atcold here but it does not seem to generalize to all envs |
@AloshkaD That seems to be more of a alterations of the models, which is exactly what I would like to avoid doing; as it might generate more unforeseen issues and bugs when changing all the models in such a way. I was hoping to be able to simply change the input parsing code (stable_baselines/common/input.py) (that almost all the models use). However, if this unlikely to be possible, then a redesign of the return of the input parsing code might be a more viable solution to this problem. |
Sorry, I've been away these past two weeks... What you found is a working hack. |
@Atcold thank you. Similarly, concatenating images on the channel access worked for me but it caused many issues with tensorboard logging. The logging expect an image that is 4 channels at most and by passing 6 channels it fails. Even if I initialize the empty tensor to the right shape, the incoming images have 6 channels. I'm going to dedicate more time to fix this issue in the weekend. |
@AloshkaD, you can always reshape your data before logging it.
You can pass |
Hi , My requirement is also similar to @AloshkaD, where I want to process multiple images and measurement vectors. I am open to try concatenating the images. Did you pad zeros to 1-D vector to concatenate with the images. Do you have reference code somewhere that I can use as a starting point? |
Hi, it is mentioned here in the doc: The PR you are referring only adds the support for the VecEnvs, not the algorithms. |
Thanks for your quick response and clarification. I was thinking that the feature is supported but the document is out of date. So the workaround it to basically do what @AloshkaD by concatenating the images across the channel axis. |
@araffin Has anyone proposed a PR to implement Tuple/Dict/etc for the action space? I came across this in a project I'm working on- I need to specify both discrete values (which internally in the Env represent indexes into an array) and continuous (specifying specific new amounts to add to the array, to simplify a bit). I'm open to working on a PR if none is in the works. |
Experimented a bit with a MultiMixedProbabilityDistribution: https://github.com/hill-a/stable-baselines/compare/master...bschreck:add-multi-mixed-proba?expand=1 Not tested at all yet |
Hello,
|
Small update on that topic, dict obs space will be supported for HER (see #273 ), when using |
Thanks @gautams3. As @araffin mentioned, this may not work for my case where I have images and 1d sensor data. I'm still using the workaround in which I convert my state observations into an image with multiple channels(3 for rgb, 1 depth, and one for each sensor) and recover the signal data before feeding them to the network. I'm using PPO2 |
Hello @AloshkaD, |
You can use a custom policy for this. In case of CNN policy you can replace the num_direct_features = NUMBER_OF_DIRECT_FEATURES
def augmented_nature_cnn(scaled_images, **kwargs):
"""
Copied from stable_baselines policies.py.
This is nature CNN head where last channel of the image contains
direct features on the last channel.
:param scaled_images: (TensorFlow Tensor) Image input placeholder
:param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
:return: (TensorFlow Tensor) The CNN output layer
"""
activ = tf.nn.relu
# Take last channel as direct features
other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
# Take known amount of direct features, rest are padding zeros
other_features = other_features[:, :num_direct_features]
scaled_images = scaled_images[..., :-1]
layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
layer_3 = conv_to_fc(layer_3)
img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))
concat = tf.concat((img_output, other_features), axis=1)
return concat
policy_kwargs = {
"cnn_extractor": augmented_nature_cnn(num_features)
}
agent = PPO2(policy_kwargs=policy_kwargs, ...) |
Additional remark: you should be careful regarding the automatic normalization, cf discussion #456 |
For most of the Atari games, the observation space is quite simple, you either have a Box or Discrete. The problem is that when working with real world environments or business cases, some have more complex observation spaces: single/multiple Box or a combination of Box and Discrete. Hence support for Tuple would be very nice. The custom environment i try to implement with stable-baselines has a Tuple observation space of 4 different time series represented as 'Box', each with different shapes . After reading the comments in this section, i understood that one can merge all of them for the input and then split them apart in the custom policy. Can somebody give an example of how this might be achieved? |
You can append the "direct features" (non-image) features on e.g. last channel of the image, and pad it with zeros to match the other dimensions. Then you can use a def create_augmented_nature_cnn(num_direct_features):
"""
Create and return a function for augmented_nature_cnn
used in stable-baselines.
num_direct_features tells how many direct features there
will be in the image.
"""
def augmented_nature_cnn(scaled_images, **kwargs):
"""
Copied from stable_baselines policies.py.
This is nature CNN head where last channel of the image contains
direct features.
:param scaled_images: (TensorFlow Tensor) Image input placeholder
:param kwargs: (dict) Extra keywords parameters for the convolutional layers of the CNN
:return: (TensorFlow Tensor) The CNN output layer
"""
activ = tf.nn.relu
# Take last channel as direct features
other_features = tf.contrib.slim.flatten(scaled_images[..., -1])
# Take known amount of direct features, rest are padding zeros
other_features = other_features[:, :num_direct_features]
scaled_images = scaled_images[..., :-1]
layer_1 = activ(conv(scaled_images, 'cnn1', n_filters=32, filter_size=8, stride=4, init_scale=np.sqrt(2), **kwargs))
layer_2 = activ(conv(layer_1, 'cnn2', n_filters=64, filter_size=4, stride=2, init_scale=np.sqrt(2), **kwargs))
layer_3 = activ(conv(layer_2, 'cnn3', n_filters=64, filter_size=3, stride=1, init_scale=np.sqrt(2), **kwargs))
layer_3 = conv_to_fc(layer_3)
# Append direct features to the final output of extractor
img_output = activ(linear(layer_3, 'cnn_fc1', n_hidden=512, init_scale=np.sqrt(2)))
concat = tf.concat((img_output, other_features), axis=1)
return concat
return augmented_nature_cnn |
I am very interested in getting mixed dictionary input spaces officially supported in stable-baselines and would be willing to pay for someone to do the work since I doubt I have the skills to do it myself. If anyone here has the skills or knows of a pay-for service where I might post the project, please let me know. |
Is there any update on this? |
@nicofirst1 |
Any clue on how long will it take? |
I can not give any exact times but at least a month, I would say. Regarding your rllibs problem: You could modify your space to be a Tuple, no? Just make sure you provide observations in same order on each step. Please do not further this discussion here, but just food for thought. |
Regarding your problem, it seems to me that Discrete is a sub ensemble of MultiDiscrete, so you could use only MultiDiscrete space in your case. |
Closing this as DLR-RM/stable-baselines3#243 in now merged with SB3 master =) |
I want to train using two images from different cameras and an array of 1d data from a sensor. I'm passing these input as my env state. Obviously I need a cnn that can take those inputs, concatenate, and train on them. My question is how to pass these input to such a custom cnn in polocies.py. Also, I tried to pass two images and apparently dummy_vec_env.py had trouble with that.
obs = env.reset() File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 57, in reset self._save_obs(env_idx, obs) File "d:\resources\stable-baselines\stable_baselines\common\vec_env\dummy_vec_env.py", line 75, in _save_obs self.buf_obs[key][env_idx] = obs ValueError: cannot copy sequence with size 2 to array axis with dimension 80
I appreciate any thoughts or examples.
The text was updated successfully, but these errors were encountered: