-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor convolution layer and add deconvolution layer #1615
Conversation
0957e39
to
7990bbe
Compare
} | ||
// Special case: im2col is the identity for 1x1 convolution with stride 1 | ||
// and no padding, so flag for skipping the buffer and transformation. | ||
is_1x1_ = kernel_w_ == 1 && kernel_h_ == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the case for full convolution too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still no additional testing for the general expression, and anyway that's an orthogonal concern; how about I follow up with another PR that includes those tests? (I realized recently those tests don't even have to be done with ConvolutionLayer
, they just need to check im2col
, which should be easier...)
Thanks for bringing the convolution code to order and delivering full-fledged deconvolution at the same time Jon!
All rise, secret order of the Caffe convolution...
While you and I can keep saying "backward convolution," vision parlance seems to be converging on "deconvolution." I think all its usages will come under the "deconvolution" umbrella sooner or later so we might as well name it what everyone will look for.
Right, #610 deserves attention once the fires are out. (Where the fires are data, Net owning phase and device, and double-checking the thread leak in dev.)
I could take a look at this since I did the original integration, and should warm-up for hacking cuDNN R2 as well. More likely a follow-up PR instead of pushing it here. |
// wrap im2col/col2im so we don't have to remember the (long) argument lists | ||
inline void conv_im2col_cpu(const Dtype* data, Dtype* col_buff) { | ||
im2col_cpu(data, conv_in_channels_, conv_in_height_, conv_in_width_, | ||
kernel_h_, kernel_w_, pad_h_, pad_w_, stride_h_, stride_w_, col_buff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be slightly more elegant to give BaseConvolutionLayer
a private im2col_layer_
which {Dec,C}onvolutionLayer
call Forward
and Backward
on, instead of these functions? (Possibly also saving some duplicated setup logic & private variables.)
Relatedly, I always kind of thought we should have a separate BiasLayer
internally called by both InnerProductLayer
and ConvolutionLayer
, factoring out that little bit of logic, and allowing one to use biases without multiplicative weights, for whatever that's worth.
Just a minor thought -- definitely doesn't need to be done here as this is nice cleanup regardless, and of course deconvolution layer will be a welcome feature.
This looks good to merge to me if you think it's ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think the im2col
layer would probably be a bit better, and I agree with factoring out the bias, although I'd rather save those things for later PRs.
Other than that, I think I ought to add some tests that at least call deconv forward and backward, and then it'll be ready.
7990bbe
to
b878285
Compare
b878285
to
816c6db
Compare
FWIW I've been running these changes and all seems to be working well. |
This provides a common place for code used by ConvolutionLayer and DeconvolutionLayer, simplifying the implementations of both.
Thanks for the datapoint @jyegerlehner. I've added basic doc comments and tests, so this should be ready to go pending any further comments. |
Add deconvolution layer with refactoring of convolution layer to share code
Thanks for adding deconvolution while making convolution look the most sane it ever has Jon! |
Can this layer used for projecting one activation in a given feature map down to the image pixel? |
@JiaxuZhu this is exactly purpose of this layer |
@ducha-aiki Thanks! |
Is there any tutorial for this layer? I don't know how to use it |
When doing convolution followed by a deconvolution to reconstruct the input, if the num_output parameter (i.e. the number of filters) is equal, this causes an error (below). If the num_output of only the deconvolution layer is set to 1 - it works. Is the num_output parameter different for convolution and deconvolution? Shouldn't the number of kernels be equal in both these layers? Here my input data is 5x5 in batches of 100. Below is my prototxt script.
|
The two bottom blobs of the loss layer have to have the same shape. Otherwise it can't compute the differences between their elements. Assuming "data" is the same blob as "flatdata", the bottom blob of the "conv1" layer, the message implies it has one channel. Therefore the top blob of "deconv1" also must have 1 channel, which means num_output needs to be 1. |
@jyegerlehner right. This not a limit on conv / deconv layers in any way but a requirement of the |
Thanks @jyegerlehner and @shelhamer for the response! Is there a design workaround for this? to be able to do deconvolution with the same number of kernels as the convolution? |
Every time I train this autoencoder-like network (shown above) with the deconvolutional layer, the training loss decreases once and then stays the same value (6412.21 shown below) for as long as I run epochs. Have you seen similar behavior?
|
@Tgaaly No, I haven't encountered a problem like that. One thing that looks suspicious is that l2 error is so large, when the sigmoids can only give you a response from [0,1], or [-1,1], forget which. As if the data hasn't been scaled to a range that sigmoids can produce. By the way in my particular case I'm using PReLUs not sigmoids. You might try the way-overcomplete approach with lots of deconv output channels, followed by perhaps PReLUs and then 1x1 convs to bring the number of channels down to what you need (1 channel ?). Those are just a few random thoughts. Hard to say more without knowing more about the problem you are solving and what you are doing. |
@Tgaaly You don't have to make flatdata layer since you are dealing with colored images (I suppose). You can set the group parameter of convolution parameters to 3 in both conv and deconv layers. This should take care of your problem. You can also use dropout layer if you want to achieve dropout auto encoder. |
This PR adds a
DeconvolutionLayer
that flips the forward and backward passes ofConvolutionLayer
. (The resulting operation is still convolution, but the sense of all the parameters is reversed, so that, in particular, strided deconvolution results in upsampling whereas strided convolution results in downsampling.)Rather than duplicate all the
ConvolutionLayer
code, common sections are factored out into a parent class,BaseConvolutionLayer
. The tricky GEMM parameters and the column buffer are hidden from the forward and backward implementations, which I hope you agree are much more readable.Positives
ConvolutionLayer
needs to do anim2col
and acol2im
in the backward pass, both operations inDeconvolutionLayer
's backward pass requireim2col
, so a special flag is added to avoid doing this twice.Reservations
ConvolutionLayer
. Also note that the diff size is really half what it appears to be, because of the point below.Design choices
util/
. However, these end up requiring a large number of arguments (each needs most of the convolution parameters). So, one could wrap all the common arguments in astruct
, like cuDNN... but then thatstruct
may as well be the layer class itself, so we have the current design.im2col
in deconv backward, we could leave the column buffer out of the wrapper functions. However, this leads to more duplicated code, and cuts down significantly on readability.im2col
s followed by gemms, so in theory we could unify them. However, then we need additional arguments and logic to figure out when to transpose the column buffer. I think this could still be done, but it's a bit of a wash.Forthcoming
DeconvolutionLayer
.DeconvolutionLayer
. Note that most of the functionality is exercised by theConvolutionLayer
tests. I've only briefly checked the forward/backward passes for upsampling/downsampling, so there could be a bug or two left.