Question regarding paper #1

melgor · 2017-07-21T10:09:12Z

I really grateful that @wy1iu release the code. You and @ydwen are really pushing the Face-Verification forward.

I have some question regarding the paper and the code:

What is the major change between L-SoftMax and A-SoftMax. For the equation it look like that in L-SoftMax weight are transformed to norm of weights and in A-SofrMax weight are transformed to normalized weight, right? If this is true, the main motivation was section 3.3 in Large-Margin Softmax Loss for Convolutional Neural Networks?
Could you explain how did you choose function ψ (which replace cos(θ))?
In both paper you use Taylor Series of cos(mθ) (Eq. 7 in Large Margin), right? What was the idea behind using different degree of series based on margin value? Why not using same for all margin?
Here is my intuition behind this both paper: in fact we just scale the output from Linear layer by matrix of ones with different numbers (<1) on target classes. Both paper propose different method of scaling (with theoretical explanation). I think that maybe there is possible to make implementation which would just use scale matrix. I must think about it as there are many non-linear operation here.
I was thinking about using CenterLoss but using cosine similarity. But then I realized than it is equivalent to SoftMax layer without bias (and SoftMax also compare features to other class center as well, not only target, so it make features even better). Do you agree with my interpretation?

wy1iu · 2017-07-21T23:34:31Z

Hi, thank you for your interest in our work. I am happy to answer your questions. :)

A-Softmax normalizes the weights and zeros out the biases in the final FC layer, which makes the loss only penalizes the angles. In contrast, L-Softmax will not necessarily normalize the weights and zero out the biases, so it does not necessarily penalize the angles, although in toy examples it did. The main difference is clearly described in the SphereFace paper.
It is natural to reserve the $[0,\frac{\pi}{m}]$ part of the function $cos(m\theta)$. So all we need to do is to design the $[\frac{\pi}{m},\pi]$ part. In fact, the design of this part is not very crucial as long as it is monotonically decreasing.
It is simply a decomposition for $cos(m\theta)$. When $m$ changes, the decomposition will change too.
Of course there will be a different interpretation (somewhat matrix form). However, as you mentioned, this nonlinearity may be difficult to model.
I kinda agree. Softmax without biases may do the same job as center loss, although the back-prop dynamics may be different. Thus adding center loss may help much at the begining, but will improve less and less while the iteration goes. However, combining center loss and softmax loss still make sense to me.

melgor · 2017-07-24T19:37:04Z

I have question about your nice implementation of MarginInnerProductLayer
It is very efficient, much more than using formulas from paper.

I almost understand the idea behind it. but I still can not understand how did you find formulas for sign_1 and others.
It is very interesting way for replacing any for/while loop for finding value for k. Could you explain how did you found such formulas or maybe point in what kind of field should I analyse/understand to get intuition behind it?

melgor · 2017-08-10T04:15:15Z

Could you explain how you get approximation for this equation?

ydwen · 2017-08-10T05:17:26Z

Hi melgor. I am not sure I have understood exactly what you are asking.
I guess you are confused by the implementation. Why we didn't completely follow the equations in the paper to implement the layer?
The answer is efficiency. It is an alternative implementation and there is no approximation in our code. Sign_1 and others are intermediate variables, which are designed to avoid replicated computation. It may not be the optimal way but a trade-off between speed and memory.

wy1iu · 2017-08-10T05:33:39Z

Sorry for missing your question @melgor. As ydwen mentioned, our implementation is efficient in the sense that you have stored some of your intermediate computation results for subsequent reuse (similar to the idea of dynamic programming). It is basically to trade memory for speed. Most importantly, this implementation is totally equivalent to the original formulation in the paper (no approximation happens).

melgor · 2017-08-10T08:06:53Z

Thanks for the answer. I was just trying to get your equation from original in paper and I could not get exactly the same answers. (I'm doing it as a exercise as your implementation is much faster than simple one)

nyyznyyz1991 · 2017-10-23T13:17:21Z

@wy1iu @melgor
Thanks for your discussion, the implementation of sign_3 and sign_4(with m = 4) is impressive and elegant, it gets rid of the calculation of theta using arg_cos and avoids replicated computation. How did you deduct the formula?
sign_3 = sign_0 * sign(2 * cos_theta_quadratic_ - 1)
sign_4 = 2 * sign_0 + sign_3 - 3
Is there any explanation about it?

amirhfarzaneh · 2018-08-24T02:42:24Z

Can someone please explain why the psi function has to be monotonically decreasing?
@wy1iu , @melgor

wy1iu closed this as completed Aug 10, 2017

wy1iu reopened this Aug 10, 2017

melgor closed this as completed Aug 10, 2017

belleoct mentioned this issue Sep 8, 2017

Training Cant't Converge. #30

Open

double-vane mentioned this issue Sep 26, 2017

loss =87.3 #40

Open

taoyunuo mentioned this issue Nov 30, 2017

loss is about 9？？How to make loss drop #59

Open

deepage mentioned this issue Mar 15, 2018

Some problems about training #75

Closed

LaviLiu mentioned this issue Jul 18, 2018

loss=87.3365 #103

Open

Riko0 mentioned this issue Sep 10, 2019

lambda on test #126

Open

ghost mentioned this issue Dec 18, 2019

why my softmax loss [email protected] , who can help me ? #132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding paper #1

Question regarding paper #1

melgor commented Jul 21, 2017 •

edited

Loading

wy1iu commented Jul 21, 2017

melgor commented Jul 24, 2017

melgor commented Aug 10, 2017

ydwen commented Aug 10, 2017 •

edited

Loading

wy1iu commented Aug 10, 2017

melgor commented Aug 10, 2017

nyyznyyz1991 commented Oct 23, 2017

amirhfarzaneh commented Aug 24, 2018 •

edited

Loading

Question regarding paper #1

Question regarding paper #1

Comments

melgor commented Jul 21, 2017 • edited Loading

wy1iu commented Jul 21, 2017

melgor commented Jul 24, 2017

melgor commented Aug 10, 2017

ydwen commented Aug 10, 2017 • edited Loading

wy1iu commented Aug 10, 2017

melgor commented Aug 10, 2017

nyyznyyz1991 commented Oct 23, 2017

amirhfarzaneh commented Aug 24, 2018 • edited Loading

melgor commented Jul 21, 2017 •

edited

Loading

ydwen commented Aug 10, 2017 •

edited

Loading

amirhfarzaneh commented Aug 24, 2018 •

edited

Loading