Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mean-variance normalization layer #846

Merged
merged 7 commits into from
Aug 12, 2014
Merged

mean-variance normalization layer #846

merged 7 commits into from
Aug 12, 2014

Conversation

qipeng
Copy link
Contributor

@qipeng qipeng commented Aug 4, 2014

Normalizes activations to zero-mean unit-variance for each channel of each datum.

Unit tests included.

@bhack
Copy link
Contributor

bhack commented Aug 4, 2014

This will support per sample mean and variance normalization?

@qipeng
Copy link
Contributor Author

qipeng commented Aug 4, 2014

HI @bhack , this implements per sample and per channel MVN :)

@bhack
Copy link
Contributor

bhack commented Aug 4, 2014

Seems that there was a download problem with nvidia on Travis build. Can you make a commit to let the build restart?

@shelhamer
Copy link
Member

@jeffdonahue I've had a few builds time out due to CUDA downloads too. Any
word from the Travis team on caching these packages? If not, perhaps we
should have our own mirror of the NVIDIA dependencies at
dl.caffe.berkeleyvision.org.

On Monday, August 4, 2014, bhack [email protected] wrote:

Seems that there was a download problem with nvidia on Travis build. Can
you make a commit to let the build restart?


Reply to this email directly or view it on GitHub
#846 (comment).

Evan Shelhamer

@jeffdonahue
Copy link
Contributor

They sent an initial response saying they would look into it but haven't heard anything since. Travis also has an option to cache arbitrary directories but I played around with it a bit and couldn't figure out how to get it working...if anyone else wants to try, go for it.

Our own mirror would be a good idea, but I'm not sure what files need to be mirrored since it's all done with apt -- I guess it's just some deb packages? Anyway, go for it if you know what to do.

@bhack
Copy link
Contributor

bhack commented Aug 4, 2014

@jeffdonahue I see that directory caching is "currently only available for private repositories on travis-ci.com"

@jeffdonahue
Copy link
Contributor

@bhack ah! that explains it, thanks.

EXPECT_LE(sum, 0.001);
// expect unit variance
EXPECT_GE(var, 0.999);
EXPECT_LE(var, 1.001);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use:

const Dtype kErrorBound = 0.001;
EXPECT_NEAR(0, sum, kErrorBound);
EXPECT_NEAR(1, var, kErrorBound);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffdonahue Done, thanks for the comment!

@jeffdonahue
Copy link
Contributor

HI @bhack , this implements per sample and per channel MVN :)

Hmm.. I don't see any switch parameter or anything to choose whether to normalize per-sample vs. per-channel -- it looks to me like it's always done per channel?

Could you add an entry to src/caffe/layer_factory.cpp so the layer can be used in a net?

@@ -236,6 +236,39 @@ class LRNLayer : public Layer<Dtype> {
vector<Blob<Dtype>*> product_bottom_vec_;
};

/* MVNLayer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be in common_layers instead of vision_layers (unless it's really only going to compute a per-channel mean)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this was originally meant for vision tasks, so the MVN was done for each channel. I'm not sure how useful it might be for other tasks but if you feel it's useful, I can try to implement two behaviors based on a switch and move it to common_layers. Also for clarification, for general layers, should I just treat everything after num as one dimension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok -- the per-channel normalization is useful alone, it's not necessary to extend to per-sample for this PR. I just thought you had also implemented the per-sample normalization due to your comment I quoted above? Anyway, it's up to you whether you want to also implement the per-sample normalization (and then move to common_layers.hpp if you do); otherwise I think we can merge this once you address my caffe_set nitpick.

@bhack
Copy link
Contributor

bhack commented Aug 5, 2014

@jeffdonahue if you still want to mirror cuda on Berkley host you could use apt-mirror with repository in cuda.list (this text file is inside cuda-repo package you install in Travis script)

@bhack
Copy link
Contributor

bhack commented Aug 10, 2014

@qipeng can you add a parameter to let only perform mean subtraction? @mtamburrano can you give some feedbacks on the convergence when training with the introduction of this layer in the proto?

@mtamburrano
Copy link
Contributor

hi @qipeng,
nice work, I think this layer is very useful.
By the way I have a couple of doubts:

  • To build your pull request, I needed to add these line
case LayerParameter_LayerType_MVN: return new MVNLayer(param);

to layer_factory.cpp. I see your PR passes the Travis CI build test, not sure how it did it :)

  • I tried to train a network using MVN for data normalization. Without it my net reaches about 90% precision, but adding your layer the net doesn't converge anymore...
    I concatenate an image_data_layer to your MVN like this:

layers {
name: "data"
type: IMAGE_DATA
top: "data"
top: "label"
image_data_param {
...
}
}
layers {
name: "mvn"
type: MVN
bottom: "data"
top: "data"
}

but in this way precision doesn't go above 0.001%.
I'm doing it wrong?

@qipeng
Copy link
Contributor Author

qipeng commented Aug 11, 2014

@jeffdonahue @bhack I've addressed your comments in the latest commit, thanks for the insightful comments! Specifically, I've added the option to do mean-only normalization, per-sample normalization, added respective unit tests, added my layer to the layer factory, and moved the layer to common_layers.

@mtamburrano I'm not sure exactly what the problem could be, your application of the layer seems correct and without knowledge of the image data you're using, I'm not sure what could've caused the accuracy drop... But I'll be happy to help with experiments.

@bhack
Copy link
Contributor

bhack commented Aug 12, 2014

@mtamburrano Do you have the same problem training with 1 channel (grayscale) images?

@mtamburrano
Copy link
Contributor

@bhack, it doesn't make difference.
Training with 1 channel only, without MVN the net reaches about 80-85% precision in 21000 iterations, with MVN after 435000 iterations the precision is still stuck around zero percent

@mtamburrano
Copy link
Contributor

I did some test which I report here, I hope they are useful:

lr(learning rate) : 0.0003
across_channels: false (default)
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0003
across_channels: true
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0003
across_channels: true
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

lr: 0.0003
across_channels: false (default)
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: false (default)
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: true
normalize_variance: true (default)
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: true
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

lr : 0.0001
across_channels: false (default)
normalize_variance: false
---> loss: CONVERGES - precision: >85%

lr : 0.0002
across_channels: false (default)
normalize_variance: false
---> loss: NaN or doesn't converge - precision: ~0%

so the only way I managed to let the net to converge is using per-channel mean and not using variance normalization AND lowering the learning rate. I tried to lower the learning rate (even below 0.0001) in the other cases but without success.
The aim of the net is to perform OCR on digits, so the train images contain digits with different conditions light, various rotations, scales and perspectives.

@jeffdonahue
Copy link
Contributor

Thanks for writing this layer and addressing my comments @qipeng! Based on the tests this seems to be correctly implemented so I'll merge despite the above results from @mtamburrano.

jeffdonahue added a commit that referenced this pull request Aug 12, 2014
mean-variance normalization layer
@jeffdonahue jeffdonahue merged commit c9e22ab into BVLC:dev Aug 12, 2014
@shelhamer shelhamer mentioned this pull request Sep 18, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
mean-variance normalization layer
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
mean-variance normalization layer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants