mean-variance normalization layer #846

qipeng · 2014-08-04T02:14:41Z

Normalizes activations to zero-mean unit-variance for each channel of each datum.

Unit tests included.

bhack · 2014-08-04T11:50:01Z

This will support per sample mean and variance normalization?

qipeng · 2014-08-04T15:20:03Z

HI @bhack , this implements per sample and per channel MVN :)

bhack · 2014-08-04T15:49:16Z

Seems that there was a download problem with nvidia on Travis build. Can you make a commit to let the build restart?

shelhamer · 2014-08-04T16:00:26Z

@jeffdonahue I've had a few builds time out due to CUDA downloads too. Any
word from the Travis team on caching these packages? If not, perhaps we
should have our own mirror of the NVIDIA dependencies at
dl.caffe.berkeleyvision.org.

On Monday, August 4, 2014, bhack [email protected] wrote:

Seems that there was a download problem with nvidia on Travis build. Can
you make a commit to let the build restart?

—
Reply to this email directly or view it on GitHub
#846 (comment).

Evan Shelhamer

jeffdonahue · 2014-08-04T17:33:30Z

They sent an initial response saying they would look into it but haven't heard anything since. Travis also has an option to cache arbitrary directories but I played around with it a bit and couldn't figure out how to get it working...if anyone else wants to try, go for it.

Our own mirror would be a good idea, but I'm not sure what files need to be mirrored since it's all done with apt -- I guess it's just some deb packages? Anyway, go for it if you know what to do.

bhack · 2014-08-04T17:53:04Z

@jeffdonahue I see that directory caching is "currently only available for private repositories on travis-ci.com"

jeffdonahue · 2014-08-04T18:11:54Z

@bhack ah! that explains it, thanks.

jeffdonahue · 2014-08-04T18:16:14Z

src/caffe/test/test_mvn_layer.cpp

+      EXPECT_LE(sum, 0.001);
+      // expect unit variance
+      EXPECT_GE(var, 0.999);
+      EXPECT_LE(var, 1.001);


please use:

const Dtype kErrorBound = 0.001; EXPECT_NEAR(0, sum, kErrorBound); EXPECT_NEAR(1, var, kErrorBound);

@jeffdonahue Done, thanks for the comment!

jeffdonahue · 2014-08-05T18:16:51Z

HI @bhack , this implements per sample and per channel MVN :)

Hmm.. I don't see any switch parameter or anything to choose whether to normalize per-sample vs. per-channel -- it looks to me like it's always done per channel?

Could you add an entry to src/caffe/layer_factory.cpp so the layer can be used in a net?

jeffdonahue · 2014-08-05T18:28:15Z

include/caffe/vision_layers.hpp

@@ -236,6 +236,39 @@ class LRNLayer : public Layer<Dtype> {
  vector<Blob<Dtype>*> product_bottom_vec_;
 };

+/* MVNLayer


should probably be in common_layers instead of vision_layers (unless it's really only going to compute a per-channel mean)

Hmm this was originally meant for vision tasks, so the MVN was done for each channel. I'm not sure how useful it might be for other tasks but if you feel it's useful, I can try to implement two behaviors based on a switch and move it to common_layers. Also for clarification, for general layers, should I just treat everything after num as one dimension?

It's ok -- the per-channel normalization is useful alone, it's not necessary to extend to per-sample for this PR. I just thought you had also implemented the per-sample normalization due to your comment I quoted above? Anyway, it's up to you whether you want to also implement the per-sample normalization (and then move to common_layers.hpp if you do); otherwise I think we can merge this once you address my caffe_set nitpick.

bhack · 2014-08-05T20:52:35Z

@jeffdonahue if you still want to mirror cuda on Berkley host you could use apt-mirror with repository in cuda.list (this text file is inside cuda-repo package you install in Travis script)

bhack · 2014-08-10T09:55:28Z

@qipeng can you add a parameter to let only perform mean subtraction? @mtamburrano can you give some feedbacks on the convergence when training with the introduction of this layer in the proto?

mtamburrano · 2014-08-11T10:22:17Z

hi @qipeng,
nice work, I think this layer is very useful.
By the way I have a couple of doubts:

To build your pull request, I needed to add these line


  case LayerParameter_LayerType_MVN:
    return new MVNLayer(param);

to layer_factory.cpp. I see your PR passes the Travis CI build test, not sure how it did it :)

I tried to train a network using MVN for data normalization. Without it my net reaches about 90% precision, but adding your layer the net doesn't converge anymore...
I concatenate an image_data_layer to your MVN like this:

layers { name: "data" type: IMAGE_DATA top: "data" top: "label" image_data_param { ... } } layers { name: "mvn" type: MVN bottom: "data" top: "data" }

but in this way precision doesn't go above 0.001%.
I'm doing it wrong?

…ory, moved to common_layers

qipeng · 2014-08-11T18:58:25Z

@jeffdonahue @bhack I've addressed your comments in the latest commit, thanks for the insightful comments! Specifically, I've added the option to do mean-only normalization, per-sample normalization, added respective unit tests, added my layer to the layer factory, and moved the layer to common_layers.

@mtamburrano I'm not sure exactly what the problem could be, your application of the layer seems correct and without knowledge of the image data you're using, I'm not sure what could've caused the accuracy drop... But I'll be happy to help with experiments.

bhack · 2014-08-12T00:02:33Z

@mtamburrano Do you have the same problem training with 1 channel (grayscale) images?

mtamburrano · 2014-08-12T12:39:43Z

@bhack, it doesn't make difference.
Training with 1 channel only, without MVN the net reaches about 80-85% precision in 21000 iterations, with MVN after 435000 iterations the precision is still stuck around zero percent

mtamburrano · 2014-08-12T16:54:19Z

I did some test which I report here, I hope they are useful:

lr(learning rate) : 0.0003
across_channels: false (default)
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0003
across_channels: true
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0003
across_channels: true
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

lr: 0.0003
across_channels: false (default)
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: false (default)
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: true
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: true
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: false (default)
normalize_variance: false
---> loss: CONVERGES - precision: >85%

lr : 0.0002
across_channels: false (default)
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

so the only way I managed to let the net to converge is using per-channel mean and not using variance normalization AND lowering the learning rate. I tried to lower the learning rate (even below 0.0001) in the other cases but without success.
The aim of the net is to perform OCR on digits, so the train images contain digits with different conditions light, various rotations, scales and perspectives.

jeffdonahue · 2014-08-12T20:03:58Z

Thanks for writing this layer and addressing my comments @qipeng! Based on the tests this seems to be correctly implemented so I'll merge despite the above results from @mtamburrano.

mean-variance normalization layer

jeffdonahue reviewed Aug 4, 2014
View reviewed changes

jeffdonahue reviewed Aug 5, 2014
View reviewed changes

qipeng added 7 commits August 11, 2014 09:53

mean-variance normalization layer

b04aa00

reduced blas calls

e7c11e1

lint

cc352bf

addressed Jeff's comment

1e11ef7

added cross-channel MVN, Mean-only normalization, added to layer fact…

9e903ef

…ory, moved to common_layers

minor fix for layer factory

a5f4962

lint

a3083f8

jeffdonahue added a commit that referenced this pull request Aug 12, 2014

Merge pull request #846 from qipeng/mvn-layer

c9e22ab

mean-variance normalization layer

jeffdonahue merged commit c9e22ab into BVLC:dev Aug 12, 2014

shelhamer mentioned this pull request Sep 18, 2014

[cancelled] Next #1109

Merged

shelhamer mentioned this pull request Sep 19, 2014

Next: release candidate #1112

Merged

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#846 from qipeng/mvn-layer

5ea08a1

mean-variance normalization layer

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#846 from qipeng/mvn-layer

3ddfd87

mean-variance normalization layer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mean-variance normalization layer #846

mean-variance normalization layer #846

qipeng commented Aug 4, 2014

bhack commented Aug 4, 2014

qipeng commented Aug 4, 2014

bhack commented Aug 4, 2014

shelhamer commented Aug 4, 2014

jeffdonahue commented Aug 4, 2014

bhack commented Aug 4, 2014

jeffdonahue commented Aug 4, 2014

jeffdonahue Aug 4, 2014

qipeng Aug 4, 2014

jeffdonahue commented Aug 5, 2014

jeffdonahue Aug 5, 2014

qipeng Aug 5, 2014

jeffdonahue Aug 5, 2014

bhack commented Aug 5, 2014

bhack commented Aug 10, 2014

mtamburrano commented Aug 11, 2014

qipeng commented Aug 11, 2014

bhack commented Aug 12, 2014

mtamburrano commented Aug 12, 2014

mtamburrano commented Aug 12, 2014

jeffdonahue commented Aug 12, 2014

mean-variance normalization layer #846

mean-variance normalization layer #846

Conversation

qipeng commented Aug 4, 2014

bhack commented Aug 4, 2014

qipeng commented Aug 4, 2014

bhack commented Aug 4, 2014

shelhamer commented Aug 4, 2014

jeffdonahue commented Aug 4, 2014

bhack commented Aug 4, 2014

jeffdonahue commented Aug 4, 2014

jeffdonahue Aug 4, 2014

Choose a reason for hiding this comment

qipeng Aug 4, 2014

Choose a reason for hiding this comment

jeffdonahue commented Aug 5, 2014

jeffdonahue Aug 5, 2014

Choose a reason for hiding this comment

qipeng Aug 5, 2014

Choose a reason for hiding this comment

jeffdonahue Aug 5, 2014

Choose a reason for hiding this comment

bhack commented Aug 5, 2014

bhack commented Aug 10, 2014

mtamburrano commented Aug 11, 2014

qipeng commented Aug 11, 2014

bhack commented Aug 12, 2014

mtamburrano commented Aug 12, 2014

mtamburrano commented Aug 12, 2014

jeffdonahue commented Aug 12, 2014