-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mean-variance normalization layer #846
Conversation
This will support per sample mean and variance normalization? |
HI @bhack , this implements per sample and per channel MVN :) |
Seems that there was a download problem with nvidia on Travis build. Can you make a commit to let the build restart? |
@jeffdonahue I've had a few builds time out due to CUDA downloads too. Any On Monday, August 4, 2014, bhack [email protected] wrote:
Evan Shelhamer |
They sent an initial response saying they would look into it but haven't heard anything since. Travis also has an option to cache arbitrary directories but I played around with it a bit and couldn't figure out how to get it working...if anyone else wants to try, go for it. Our own mirror would be a good idea, but I'm not sure what files need to be mirrored since it's all done with apt -- I guess it's just some deb packages? Anyway, go for it if you know what to do. |
@jeffdonahue I see that directory caching is "currently only available for private repositories on travis-ci.com" |
@bhack ah! that explains it, thanks. |
EXPECT_LE(sum, 0.001); | ||
// expect unit variance | ||
EXPECT_GE(var, 0.999); | ||
EXPECT_LE(var, 1.001); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use:
const Dtype kErrorBound = 0.001;
EXPECT_NEAR(0, sum, kErrorBound);
EXPECT_NEAR(1, var, kErrorBound);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeffdonahue Done, thanks for the comment!
Hmm.. I don't see any switch parameter or anything to choose whether to normalize per-sample vs. per-channel -- it looks to me like it's always done per channel? Could you add an entry to |
@@ -236,6 +236,39 @@ class LRNLayer : public Layer<Dtype> { | |||
vector<Blob<Dtype>*> product_bottom_vec_; | |||
}; | |||
|
|||
/* MVNLayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably be in common_layers
instead of vision_layers
(unless it's really only going to compute a per-channel mean)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this was originally meant for vision tasks, so the MVN was done for each channel. I'm not sure how useful it might be for other tasks but if you feel it's useful, I can try to implement two behaviors based on a switch and move it to common_layers
. Also for clarification, for general layers, should I just treat everything after num
as one dimension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok -- the per-channel normalization is useful alone, it's not necessary to extend to per-sample for this PR. I just thought you had also implemented the per-sample normalization due to your comment I quoted above? Anyway, it's up to you whether you want to also implement the per-sample normalization (and then move to common_layers.hpp if you do); otherwise I think we can merge this once you address my caffe_set
nitpick.
@jeffdonahue if you still want to mirror cuda on Berkley host you could use apt-mirror with repository in cuda.list (this text file is inside cuda-repo package you install in Travis script) |
@qipeng can you add a parameter to let only perform mean subtraction? @mtamburrano can you give some feedbacks on the convergence when training with the introduction of this layer in the proto? |
hi @qipeng,
case LayerParameter_LayerType_MVN:
return new MVNLayer(param);
to layer_factory.cpp. I see your PR passes the Travis CI build test, not sure how it did it :)
but in this way precision doesn't go above 0.001%. |
@jeffdonahue @bhack I've addressed your comments in the latest commit, thanks for the insightful comments! Specifically, I've added the option to do mean-only normalization, per-sample normalization, added respective unit tests, added my layer to the layer factory, and moved the layer to @mtamburrano I'm not sure exactly what the problem could be, your application of the layer seems correct and without knowledge of the image data you're using, I'm not sure what could've caused the accuracy drop... But I'll be happy to help with experiments. |
@mtamburrano Do you have the same problem training with 1 channel (grayscale) images? |
@bhack, it doesn't make difference. |
I did some test which I report here, I hope they are useful: lr(learning rate) : 0.0003 lr : 0.0003 lr : 0.0003 lr: 0.0003 lr : 0.0001 lr : 0.0001 lr : 0.0001 lr : 0.0001 lr : 0.0002 so the only way I managed to let the net to converge is using per-channel mean and not using variance normalization AND lowering the learning rate. I tried to lower the learning rate (even below 0.0001) in the other cases but without success. |
Thanks for writing this layer and addressing my comments @qipeng! Based on the tests this seems to be correctly implemented so I'll merge despite the above results from @mtamburrano. |
mean-variance normalization layer
mean-variance normalization layer
mean-variance normalization layer
Normalizes activations to zero-mean unit-variance for each channel of each datum.
Unit tests included.