Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpooling and deconvolution #378

Closed
loyeamen opened this issue Jul 10, 2015 · 18 comments
Closed

Unpooling and deconvolution #378

loyeamen opened this issue Jul 10, 2015 · 18 comments

Comments

@loyeamen
Copy link

I want to create convolutional autoencoder and need deconv and unpooling layers.
Is there a way to do that with Keras?

In general i was thinking of using theano repeat function for unpooling
and regular keras conv layer for deconv

What do you think?

@pranv
Copy link
Contributor

pranv commented Jul 10, 2015

regular keras conv layer for deconv

I'm sure this is sufficient

Not sure how you want to unpool. Can you elaborate on that?

@loyeamen
Copy link
Author

I need to verify it, but i think theano repeat function can be used for unpooling (upsampling).
deconvolution can be applied by convolution, so once you unpooled, you can apply convolutional layer as a denconvolution operation

@ghif
Copy link

ghif commented Jul 10, 2015

I'm currently doing a research involving convolutional autencoder for domain adaptation/transfer learning and using keras thanks to this awesome framework :) What I did is adding the following code in convolutional.py:

class Unpooling2D(Layer):
    def __init__(self, poolsize=(2, 2), ignore_border=True):
        super(Unpooling2D,self).__init__()
        self.input = T.tensor4()
        self.poolsize = poolsize
        self.ignore_border = ignore_border

    def get_output(self, train):
        X = self.get_input(train)
        s1 = self.poolsize[0]
        s2 = self.poolsize[1]
        output = X.repeat(s1, axis=2).repeat(s2, axis=3)
        return output

    def get_config(self):
        return {"name":self.__class__.__name__,
            "poolsize":self.poolsize,
            "ignore_border":self.ignore_border}

It works well, at least, for my needs. This unpooling strategy is according to this blog post: https://swarbrickjones.wordpress.com/2015/04/29/convolutional-autoencoders-in-pythontheanolasagne/

@Tgaaly
Copy link

Tgaaly commented Sep 24, 2015

thanks @ghif for the unpooling layer. what about deconvolutional layers? has this been implemented in keras?

@isaacgerg
Copy link

@Tgaaly Did you ever figure this out? I'm willing to put the effort into making a CAE and post a test example on github if you can help me.

@brijmohan
Copy link

I am trying to create convolutional autoencoder for temporal sequences using Keras.
Here's the code

ae = Sequential()
encoder = containers.Sequential([Convolution1D(5, 5, border_mode='valid', input_dim=39, activation='tanh', input_length=39), Flatten(), Dense(5)])
decoder = containers.Sequential([Convolution1D(5, 5, border_mode='valid', input_dim=5, activation='tanh', input_length=5), Flatten(), Dense(39)])
ae.add(AutoEncoder(encoder=encoder, decoder=decoder,
                   output_reconstruction=False))

ae.compile(loss='mean_squared_error', optimizer=RMSprop())
ae.fit(X_train, X_train, batch_size=32, verbose=1)

I am getting the following error at the 3rd line where decoder is being initialized:

ValueError: negative dimensions are not allowed

Please help me resolve the issue

@nanopony
Copy link

I've prepared the proof of concept of a convolutional autoencoder with weight-sharing between convolutional-deconvolutional layer pairs and maxpool/depool layers with activated neurons sharing. The code is very sketchy at this point, however it illustrates the whole idea. Here's autoencoder's representations of MNIST:
CAE representation of MNIST

I hope it might be useful for someone.

@kevinthedestroyr
Copy link

@nanopony awesome demo, thanks for making it. I've been playing with it a bit and noticed something unexpected. If you drop the number of elements in the dense layer to a single neuron you still get reasonable outputs. I was expecting that to have thrown out way too much information to be able to reconstruct the input. Is this a misunderstanding of the architecture on my part?

Update:
I'm pretty new to theano so I don't have a why, but the following line in DePool2D seems to be the culprit:

f = T.grad(T.sum(self._pool2d_layer.get_output(train)), wrt=self._pool2d_layer.get_input(train)) * output

My guess is that you intended the output of grad to be a scalar, but in reality it's redrawing the input to self._pool2d_layer, and effectively bypassing the dense layers bottle-neck. Simply removing it and returning output instead of f from DePool2D seems to fix it.

@nanopony
Copy link

@kevinthedestroyr That's very odd behavior, since I was getting more blurred and distorted representations when I was reducing size of a bottleneck. Perhaps, if share your code as a gist, I'll check it out on my setup.

f = T.grad(T.sum(self._pool2d_layer.get_output(train)), wrt=self._pool2d_layer.get_input(train)) * output

The aim of this function is the following. For simplicity sake, assume the maxpool layer gets (2x2) as an input: [[a,b],[j,k]]. Lets assume that j>k>b>a. Therefore Maxpool passes only [j] as an output.

Lets sum all the outputs (T.sum(self._pool2d_layer.get_output(train)): Sum = j. Then lets have grad with respect to input: [[dSum/da, dSum/db], [dSum/dj, dSum/dk]]. It gives us a binary matrix [[0,0],[1,0]], which indicates which input neurons of MaxPool was active. If we want to maintain the derivative flow and avoid blur due to upsampling, we have to use this information effectively making from upsample layer DePool. So, we perform element-wise multiplication of this binary matrix with upsampled input.

I am pretty new to Theano as well, so I might have get something wrong, thus detailed
testing and simple sanity checks are very much appreciated :).

@kevinthedestroyr
Copy link

@nanopony I forked to illustrate the behavior I was seeing. Also here's the change I made:

diff --git a/conv_autoencoder.py b/conv_autoencoder.py
index 0d774cc..e887b7c 100644
--- a/conv_autoencoder.py
+++ b/conv_autoencoder.py
@@ -21,7 +21,10 @@ def load_data():

def build_model(nb_filters=32, nb_pool=2, nb_conv=3):
     model = models.Sequential()
-    d = Dense(30)
+    num_dense_inputs = nb_filters*14*14
+    dense_W = np.ones((num_dense_inputs, 1))
+    dense_b = np.zeros((1,))
+    d = Dense(1, weights=[dense_W, dense_b], trainable=False)
     c = Convolution2D(nb_filters, nb_conv, nb_conv, border_mode='same', input_shape=(1, 28,     28))
     mp =MaxPooling2D(pool_size=(nb_pool, nb_pool))
     # =========      ENCODER     ========================

As you can see, I replaced the Dense layer with an untrainable single-neuron Dense layer with all weights set to 1. This should act as a wall so that no information should be able to pass through it and you end up with uniform outputs regardless of inputs. But you end up with this unexpected result:

mnist_with

Whereas, if you comment out the T.grad line in autoencoder_layers.py you end up with the expected result of uniform output:

mnist_without

@talrozen
Copy link

@nanopony This is awesome. I'm trying to use the package for arrays of a different size, other than 28X28 pixels. But I'm getting the error msg below. My code is:

 model = build_model(nb_filters=32, nb_pool=2, nb_conv=3, input_shape=(X_train.shape[1], X_train.shape[2],
                                                                          X_train.shape[3]))
    if not False:
        model.compile(optimizer='rmsprop', loss='mean_squared_error')
        model.summary()
        model.fit(X_train, X_train, nb_epoch=2, batch_size=128, validation_split=0.2,
                  callbacks=[EarlyStopping(patience=3)])

File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 280, in _fit
outs = f(ins_batch)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/theano_backend.py", line 384, in call
return self.function(*inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
ValueError: Input dimension mis-match. (input[0].shape[1] = 25088, input[1].shape[1] = 6272)
Apply node that caused the error: Elemwise{Add}[(0, 0)](Dot22.0, InplaceDimShuffle{x,0}.0)
Toposort index: 96
Inputs types: [TensorType(float32, matrix), TensorType(float32, row)]
Inputs shapes: [(40, 25088), (1, 6272)]
Inputs strides: [(100352, 4), (25088, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Reshape{4}(Elemwise{Add}[(0, 0)].0, TensorConstant{[-1 32 14 14]}), Elemwise{Composite{(i0 * (i1 - sqr(tanh(i2))))}}[(0, 0)](Reshape{2}.0, TensorConstant{%281, 1%29 of 1.0}, Elemwise{Add}[%280, 0%29].0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

@nanopony
Copy link

@kevinthedestroyr Thank you for prompt response! I've successfully repeated the behavior.

After a bit of tinkering around debug.print, I think I actually have the explanation for observed effect. The single-neuron does works as a wall: provided it sums up 6272 inputs, it is very likely to be active and passing 1 to output in most cases. But, the structural information slips through the connection between MaxPool and DePool in the following way:

Unpooling: In the convnet, the max pooling operation is non-invertible, however
we can obtain an approximate inverse by recording the locations of the
maxima within each pooling region in a set of switch variables. In the deconvnet,
the unpooling operation uses these switches to place the reconstructions
from the layer above into appropriate locations, preserving the structure of the
stimulus. See Fig. 1(bottom) for an illustration of the procedure.

Thus, even single-neuron loses all the information, depool still recovers maximums of each 2x2 cells, providing at least something for deconv layer to recover it even further.

And therefore, if one disable this connection, information indeed won't pass and we will get the same patterns :)

Thank you for this observation!

@nanopony
Copy link

@talrozen I need to see the actual code of the model to troubleshoot. If you'd like we can discuss it in details at my repo, so this ticket won't be overburdened with tech details. The another issue I see is that I used python3, thus you might be required to change super call to accommodate lack of super shorthand in py2.x

@ulzie
Copy link

ulzie commented Mar 25, 2016

Is it possible to create multi-layers convolutional autoencoder ?? I'm trying to do it but it's not a sucess and I can't find any code with it.
Thanks

@gokceneraslan
Copy link
Contributor

Unpooling (as in deconvnet and SWWAE) still does not exist in Keras as a layer, right? Then why did you close the bug report @fchollet, if I may ask?

@Ethiral
Copy link

Ethiral commented Jun 6, 2017

Any update on creating multi-layers convolution autoencoder?
@ulzie did you figure it out then?

@micvalenti
Copy link

micvalenti commented Aug 8, 2017

Hi everybody. I'm currently trying to figure out how to build a proper decoder so to preserve the dimensions of the encoding outputs. The problem I found is specifically in the upsampling: in the encoding max-pooling, if the input is not perfectly divisible by the pooling factor, the output will be approximated depending on the 'padding' mode:

'valid' padding:
input=(25,25), pool=(3,3) => output=(8,8)

'same' padding:
input=(25,25), pool=(3,3) => output=(9,9)

On the other hand, after the upsampling, the output will be equal to the input multiplied by the upsampling factor (which I keep equal to the pooling size (3,3)), therefore (24,24) for 'same' and (27,27) for 'valid': in both cases they are not the (25,25) I need.

After a bit of work on it I found this solution can be used: we can fix the pooling padding to 'same' so that the upsampling will always give a dimension bigger than the one needed and than we could crop it. In my code I collect the pre-pooling dimensions in a list called 'pooling_input_dimension' (for example it could be [(25, 25)]), and then, after each upsampling layer 'x', I calculate the crop dimension this way:

desired_dimension = pooling_input_dimensions.pop(-1)
deltas = tuple(x - y for x, y in zip(x._keras_shape[1:-1], desired_dimension))
crop = tuple((math.ceil(delta / 2), math.floor(delta / 2)) for delta in deltas)
if len(crop) == 1:
    crop = crop[0]

Here I needed the 'if' check so to make it work with Cropping1D, since I give the variable 'crop' as input to a 'cropping_layer' which comes already set as Keras Cropping1D or Cropping2D depending on the input dimension. Let me know if any of you found better solutions!

@mrgloom
Copy link

mrgloom commented Jun 30, 2018

Seems now unpooling with indices can be done using K.tf.nn.max_pool_with_argmax

https://github.com/PavlosMelissinos/enet-keras/blob/master/src/models/layers/pooling.py

fchollet pushed a commit that referenced this issue Sep 22, 2023
For downstream use cases, nice to be able to have this in the ops layer,
instead of being forced to reach into backend.
hubingallin pushed a commit to hubingallin/keras that referenced this issue Sep 22, 2023
For downstream use cases, nice to be able to have this in the ops layer,
instead of being forced to reach into backend.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests