Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel softmax #940

Merged
merged 4 commits into from
Aug 19, 2014
Merged

Channel softmax #940

merged 4 commits into from
Aug 19, 2014

Conversation

ronghanghu
Copy link
Member

In this pull request, the behavior of SoftmaxLayer is changed from softmax over channels*height*width elements (all elements within a num) to softmax over channels elements (all elements at a spatial position within a num). This is for the purpose of running fully-connected layers as convolutions (see Net Surgery: http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb). It won't damage existing caffe examples, since fully-connected layer top blob has width==1 and height==1.

The CPU version was implemented by @longjon, and I implemented the GPU version, including GPU backward.

}
}

template <typename Dtype>
__global__ void kernel_exp(const int num, const Dtype* data, Dtype* out) {
CUDA_KERNEL_LOOP(index, num) {
__global__ void kernel_exp(const int count, const Dtype* data, Dtype* out) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kernel is the same as caffe_gpu_exp isn't it? Let's remove it and replace with caffe_gpu_exp, unless I'm misunderstanding somehow. (I know it wasn't added this PR, but I just noticed it from seeing the diff.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find caffe_gpu_exp. I only found caffe_exp, which calls vsExp in MKL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, my bad, I think I was thinking of caffe_gpu_powx. caffe_gpu_exp should probably exist but device abstraction (#610) will probably take care of this so never mind, sorry!

@jeffdonahue
Copy link
Contributor

assigning to @longjon, go ahead and merge when you're happy with everything

@longjon
Copy link
Contributor

longjon commented Aug 17, 2014

@shelhamer suggested offline adding a switch to provide the original "normalize over everything" mode. So, @shelhamer, if you still want to do that, you can append to or rewrite this PR.

@shelhamer and others, which mode do we think should be the default? It seems like the channel normalization is usually what is desired, and I doubt anyone is relying on the current behavior, although it is a little jarring to change what layers do. If we do want the default to be the channel normalization, we could go ahead and merge this, and add a switch in a later PR.

caffe_cpu_gemv<Dtype>(CblasTrans, channels, spatial_dim, 1,
bottom_diff + i * dim, sum_multiplier_.cpu_data(), 0, scale_data);
// restore the original top_diff in bottom_diff for subtraction
caffe_copy(dim, top_diff + i * dim, bottom_diff + i * dim);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that the updated SoftmaxLayer CPU no longer allows in-place computations, since in CPU implementation bottom diff is first changed and then restored
caffe_mul(top[0]->count(), bottom_diff, top_data, bottom_diff);
while GPU implementation still allows in-place computations.

@jeffdonahue @shelhamer should we allow in-place computations in SoftmaxLayer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch. I did this to avoid an extra loop, but I've added it back now to allow in-place computation. There should be no performance regression in the 1x1 case, and probably not a noticeable one in the general case, and anyway the GPU implementation is available.

In order to do this, I had to add functions to math_functions for strided dot products (which of course cblas already supports, but we didn't previously have an interface for.)

@shelhamer
Copy link
Member

@longjon merge as you please, as the switch can follow. I agree channel is
a reasonable default. Although it does change the default behavior, I
imagine anyone who has adopted the fully-convolutional models wants channel
softmax.

On Saturday, August 16, 2014, longjon [email protected] wrote:

@shelhamer https://github.com/shelhamer suggested offline adding a
switch to provide the original "normalize over everything" mode. So,
@shelhamer https://github.com/shelhamer, if you still want to do that,
you can append to or rewrite this PR.

@shelhamer https://github.com/shelhamer and others, which mode do we
think should be the default? It seems like the channel normalization is
usually what is desired, and I doubt anyone is relying on the current
behavior, although it is a little jarring to change what layers do. If we
do want the default to be the channel normalization, we could go ahead and
merge this, and add a switch in a later PR.


Reply to this email directly or view it on GitHub
#940 (comment).

This provides a more direct interface to the cblas_?dot functions.
This is useful, for example, for taking dot products across channels.
@longjon
Copy link
Contributor

longjon commented Aug 18, 2014

@ronghanghu I amended your commit with some aesthetic changes (make all the channel kernels have the form kernel_channel_[word], fix some lint errors being masked by NOLINT, and fix capitalization in comments). I think this will be ready for merge once Travis passes. The GPU implementation is a little heavy in terms of introducing lots of kernels instead of calling gpu_gemm and so forth, but it does the right thing by parallelizing over both batch and spatial dims, so I'll take it. Thanks for getting this written!

longjon added a commit that referenced this pull request Aug 19, 2014
@longjon longjon merged commit 78eea24 into BVLC:dev Aug 19, 2014
@ronghanghu ronghanghu deleted the channel-softmax branch August 19, 2014 15:50
@shelhamer
Copy link
Member

Fixed order of specialization and instantiation for clang++ build in ac64a7b. You can't call caffe_cpu_strided_dot() before its specializations as was done in caffe_cpu_dot().

This was referenced Sep 18, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants