[question] Why can't I build a cnn policy behaving like a mlp policy ? #456

vbelus · 2019-08-27T17:08:46Z

Hi, I hope this is the right place for this question.
I'm creating a custom policy for a project and for multiple reasons I wanted to build a convolutional neural network that's on paper the same as a mlp with three hidden layers [128, 64, 64].

My mlp policy is working fine, but I can't reproduce these results with a cnn policy, even though I've dug into the functions I use and it should work like a mlp.

Here is my custom cnn policy :
(n_arrays is 1 for now, this parameter is here because the reason I wanted to build a cnn extractor was to mimic multiple mlp extractors when my obs space was multiple arrays)

def custom_cnn(scaled_images, **kwargs):
    activ = tf.nn.relu
    n_arrays = scaled_images.shape[1]
    filter_width = scaled_images.shape[2]

    layer_1 = activ(conv(scaled_images, 'c1', n_filters=128, 
    filter_size=(1, filter_width), stride=1, init_scale=init_scale, **kwargs))
    layer_1 = tf.reshape(layer_1, [-1, n_arrays, 128, 1])

    layer_2 = activ(conv(layer_1, 'c2', n_filters=64, filter_size=(1, 128), 
    stride=1, init_scale=init_scale, **kwargs))
    layer_2 = tf.reshape(layer_2, [-1, n_arrays, 64, 1])

    layer_3 = activ(conv(layer_2, 'c3', n_filters=64, filter_size=(1, 64), 
    stride=1, init_scale=init_scale, **kwargs))
    layer_3 = tf.reshape(layer_3, [-1, n_arrays, 64, 1])
    layer_3 = conv_to_fc(layer_3)
    return layer_3

So basically each time I'm doing a convolution, it is on an image of shape (1, width), and I'm doing it with a kernel of shape (1, width), with n filters, which should be equivalent to a fully connected layer of size n.

However I get terrible results with such a policy compared to the mlp one.
What have I got wrong ? I'm positive I haven't made a stupid mistake about the shape of my arrays, so why are those two implementions so different during training ?

The text was updated successfully, but these errors were encountered:

araffin · 2019-08-27T17:22:52Z

Hello,

I think this issue is related to your problem (1D Convolution): #436

it is on an image of shape (1, width),

When using a CNN policy, you have to make sure the input is an image and has values in [0, 255] (cf doc).
Normalization is done automatically and this can break your training if you provide input with the wrong value range.

vbelus · 2019-08-28T08:07:00Z

Hi @araffin and thanks for your answer,
Where is it you can this information in the docs ? I can't find it.
Does that mean a simple scaling of my input will be enough ? And do the values need to be int ?

araffin · 2019-08-28T08:10:47Z

Where is it you can this information in the docs ? I can't find it.

The information is here

Does that mean a simple scaling of my input will be enough ?

If you normalize the input, you have to deactivate the normalization using scale=False when calling the parent constructor (cf doc about custom policies or code).

And do the values need to be int ?

Values will be cast to float anyway

jerabaul29 · 2019-08-28T08:37:14Z

So do you mean it is enough to just provide the float input data, and put scale=False, and then it should work? :)

vbelus · 2019-08-28T09:11:07Z

@araffin will a self.scale=False after the super() call in my custom policy work, or do I need to rewrite the whole FeedForwardPolicy class ? scale not being an attribute you can pass when initializing the class.

jerabaul29 · 2019-08-28T11:47:49Z

As fully convolutional networks are useful for many more applications than image analysis, it would be great to have a flag to disable this renormalization in an easy way for the user. Could this be done @araffin ? :)

araffin · 2019-08-28T11:59:47Z

So do you mean it is enough to just provide the float input data, and put scale=False, and then it should work?

That depends on what you mean by "it works". Will you avoid to do a second normalization? yes. Will it succeed to solve the task? maybe not (you may need hyperparameter tuning). Also, CNN assumes some locality property on your input data (like images).
It seems that the data @vbelus is working on are not really images (1D vector) and usually mlp works fine on that type of data.

do I need to rewrite the whole FeedForwardPolicy class ?

It seems you need to either write a custom FeedForwardPolicy class (this should not be too hard) or make sure the data you provide look like images, so that the normalization that is applied does not break the learning.

As fully convolutional networks are useful for many more applications than image analysis

fully convolutional networks for RL? you mean convolution?
I agree for convolution that are not 2D.

The flag is set automatically because the CnnPolicy only uses conv 2D layers afterward, and in most cases when using RL, this correspond to images.
However, it could be good to add a 1D convolution/other convolution type as feature extractor (and maybe add an example of a CNN using 1D convolution in the documentation): this would have also the effect of disabling the normalization.
See here for what I'm talking about:

stable-baselines/stable_baselines/common/policies.py

Line 559 in 7048a63

if feature_extraction == "cnn":

vbelus · 2019-08-29T13:30:08Z

Thank you, creating a custom FeedForwardPolicy class to set scale to False indeed gives me results similar to mlp.

araffin added the question Further information is requested label Aug 27, 2019

vbelus closed this as completed Aug 29, 2019

araffin mentioned this issue Sep 21, 2019

[question] [feature request] support for Dict and Tuple spaces #133

Closed

araffin mentioned this issue Jan 31, 2020

MlpPolicy is the only working policy for my custom environment #671

Closed

blurLake mentioned this issue Apr 3, 2020

[question] How to customize policy network with 1-d cnn? #782

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] Why can't I build a cnn policy behaving like a mlp policy ? #456

[question] Why can't I build a cnn policy behaving like a mlp policy ? #456

vbelus commented Aug 27, 2019

araffin commented Aug 27, 2019 •

edited

Loading

vbelus commented Aug 28, 2019

araffin commented Aug 28, 2019

jerabaul29 commented Aug 28, 2019

vbelus commented Aug 28, 2019

jerabaul29 commented Aug 28, 2019

araffin commented Aug 28, 2019

vbelus commented Aug 29, 2019

[question] Why can't I build a cnn policy behaving like a mlp policy ? #456

[question] Why can't I build a cnn policy behaving like a mlp policy ? #456

Comments

vbelus commented Aug 27, 2019

araffin commented Aug 27, 2019 • edited Loading

vbelus commented Aug 28, 2019

araffin commented Aug 28, 2019

jerabaul29 commented Aug 28, 2019

vbelus commented Aug 28, 2019

jerabaul29 commented Aug 28, 2019

araffin commented Aug 28, 2019

vbelus commented Aug 29, 2019

araffin commented Aug 27, 2019 •

edited

Loading