-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying a FCN based on VGG16 #3540
Comments
There's an example of usage of the Deconvolution2D in the one of examples - https://github.com/fchollet/keras/blob/master/examples/variational_autoencoder_deconv.py#L53-L55 If stride is 1, the formula simplifies to the: o = s(i - 1) + k - 2_p_. Please keep in mind that it could be problematic to calculate the shape by the hand if you deal with variable-sized inputs (which is feasible in the Fully Convolutional implementation). I've tried to implement a simplified shape autoinference here - https://github.com/lukovkin/keras/blob/master/keras/layers/convolutional.py#L414-L436, it seems to be working for TF backend but with Theano I've ran into the specific issues and haven't resolved them. |
Thanks @lukovkin for the update on this long-debated topic. The discussions were scattered in many threads, which makes it a little hard to track. Additional questions:
Appreciate your opinions because I didn't find clear answers by reading the related discussions, e.g., #3122, #2822, #2087, #378 and etc. |
Thanks for the great responses and the explanations and examples for Deconvolution2D! I was going off of the same paper that @dolaameng mentioned. I saw the difference in the upsampling operation and I would also be interested in the effects of the differences, although I am hoping that the system I am trying to build will not be too sensitive. Here's the solution I came up with:
DefineVGG16() is in the original post. Then I added additional convolutional layers to get down to around 8x8 for larger images.
Then I added the fully connected layers in the middle. This was my biggest original issue, I was using
Then I started building up the deconvolutional side of the network. I started by adding deconvolutional layers to mirror any additional convolutional layers that I had added to handle larger images.
Then, I created the mirrored version of the VGG16 network. In my case the formula for output size that @lukovkin provided simplified down to 2*input_size, so I just referenced the output size of the UpSampling2D layer.
Finally, I specified the output layer as
Where num_classes is the number of classes in the specific problem (2 in my case). For 224x224 images, the total number of parameters matches the 252M value that is given in the paper. Now on to training! |
* Fix exception message for Deconvolution2D * Docs update for Deconvolution2D layer (#3540) * Corrections to Deconvolution2D docs * References formatted as Markdown links * Blank lines added
Hi @psalvaggio thanks for sharing the code. Any progress on your experiment? I am specially interested in knowing the differences of using Convolution for Deconvolution (aka Transposed Convolution). It seems that the current Deconvolution implementation in Keras is the backprop operation wrt inputs. But in terms of learning capability, do you think a convolution layer will be enough to give similar results, e.g. for semantic segmentation? However, I think there will be certain differences because of the way that they share the weights - see the animations. Do you have any visualizations on the Deconvolution layers? Thanks! |
Hi @psalvaggio , How did you initialize your network? Can you share your full code? |
@dolaameng Not much yet. I'm still working on the training code and according to that paper, I'm looking at around a week of training time. This is actually my first deep learning project, so I don't have a ton of intuition as to the effects of the implementation details. I was previously in the optical modeling field and deconvolution was pretty much at the center of my research, but deconvolution here doesn't seem to have much in common. My intuition says that since the shift variables from that paper aren't present, it will lead to some distortions in the boundaries of objects, but I don't have anything to back that up yet. I will definitely be producing visualizations once I have trained the network, so I can start to understand what these "deconvolutional" layers are doing. @HarisIqbal88 I initialized the first part of the network with the VGG16 weights, as described in this repo: https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3, I haven't gotten the training running yet, so I don't have a good answer for the rest of the network. |
@dolaameng Not at all. Some comments:
|
Thank you @psalvaggio for your update - look forward to your results. I was trying to track back how the terms "deconvolution" and "unpooling" were used. If I am not mistaken, they were actually used as "convolutional sparse encoding" in papers 1 and 2 as a way to learn filters and infer feature maps - quite similar to the idea of a convolutional version of autoencoder. And then people started to use it more in a way of inverse-mapping from a feature map to the original image space ( Since then it seemed to be agreeable that the implementation of As for At the end, I found the torch documentation gives clear and complete definitions of all these operators. I hope all these discussions here would help people who were confused like me to find these resources more easily. |
@dolaameng Jeez, that was a pretty extensive review, thank you very much! I'll take time to browse through the links and may be come back with something later. |
Hi @psalvaggio , were you successful in training FCN? |
@HarisIqbal88 I got pulled away from this project almost immediately after this thread. I will be getting back to it shortly, however. I'm not sure which caffe model you are referencing, but is it possible that they preloaded the deconv layer to perform bilinear upsampling and then had some regular conv layers after that that were learned? |
@psalvaggio I am now writing FCN in Keras but got into an interesting problem. I used the Deconvolutional layer as you used. However, The input image size is not fixed in FCN(same is the case for my dataset and I cannot reshape them into same size). This carries the 'None' argument from Input() to Deconvolution2D layer which does not accept it. Any idea about how to implement FCN without fixing input image size? About the earlier discussion, I converged to the same conclusion. |
@HarisIqbal88 Right, the network I proposed here does not work for variable size input. One of the papers I was looking at (https://arxiv.org/abs/1606.02585) does make a "no-downsampling FCN" which can work for variable size input. In Keras, you do have to specify the size of the input image, but the number of parameters for that network is independent of the input size, so there would be no retraining, just some manipulation on the size of the input layer. |
FYI there is an implementation here: https://github.com/guojiyao/deep_fcn |
Thanks @lukovkin for your comment, it was very useful to me in debugging my implementation I found out implementing FCN UpSampling bit was quite fun in the end (a bit frustrating in the course of it) so I posted a Medium to explain it and show the impact of various settings : https://medium.com/@m.zaradzki/image-segmentation-with-neural-net-d5094d571b1e Here is the corresponding repo for FCN32s, FCN16s and FCN8s (handling variable image size) : Hope this helps ! PS : Related to this topic I just found this link with DeepMask implementation in Keras : |
DensNetFCN is now available in keras-contrib. |
These items & repositories are also relevant to FCN:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
I got a problem. Why the learning rate of deconvolutional layer is set to be 0 ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed. |
Hi, I'm new to the area of deep learning, so please forgive if I get some terminology wrong.
I'm trying to train a Fully Convolution Network (FCN) to perform semantic segmentation. I am attempting to use the VGG16 network as the first part of my network.
My issue is that I want to perform analysis on much bigger images than the 224x224 size the network was trained on. I can tile down to something reasonable like 512x512, but 224x224 is too small. I'm a bit confused on how to specify the middle and back half of the network. After the last VGG16 MaxPooling2D layer, my output size is (1, 512, 16, 16). I am assuming I need to insert another layer of convolutions to get it down to (1, 512, 8, 8), so I can keep the number of parameters under control. At that point, I insert the fully-connected layers as
Here's where I'm lost. I know there is a Deconvolution2D, but there's no documentation on the Keras site on how to use it to build up the deconvolution side of the network. I think I need to reshape the output of the 'fc7' layer back to 3D and then use a combination of Deconvolution2D and Upsampling2D to mirror the front half of the network, but I don't know how that would look in the code.
Any help would be greatly appreciated.
The text was updated successfully, but these errors were encountered: