cbow cbow_mean? #697

dragonxlwang · 2016-05-11T05:19:30Z

The code Here:

 if not model.cbow_mean and input_word_indices:
            neu1e /= len(input_word_indices)

Why average when not model.cbow? I thought this should be model.cbow_mean instead.

Appreciate if someone can help me out.

The text was updated successfully, but these errors were encountered:

gojomo · 2016-05-11T06:41:14Z

This is the backprop step; in order for the correction to the summed vectors, in the non-mean case, to net-total to the right number, the error must be split over all the constituent summands. See also this recent thread: https://groups.google.com/d/msg/gensim/BtN7uB1vgpc/tlvkLXqzJwAJ (which would be a better place for further discussion/questions that aren't bugs/feature-requests).

dragonxlwang · 2016-05-11T07:07:57Z

Hi, @gojomo Thanks for pointing the thread.

I believe that this is a bug in the code (at least looks to me), which also appears in the word2vec code (actually at first I found suspicious about the Milokov's word2vec code, then I thought it might help if I check gensim's implementation...

Since I think this is a bug, I put my reason below here (sorry if you think I am abusing github):

neu1e (as in Ln290 is the objective derivative w.r.t the average of context words' syn0 . Do you agree with me?
According to chain rule, the objective derivative w.r.t. one context word's syn0 is thus the above neu1e times it's weight in the average (which is 1 / len).

You argument in google group was mainly about match the residual. However, residual happens to be the gradient of L2 loss. In word2vec, nevertheless, the objective is different.

In addition, even in your case of L2 loss (residual match) in the google group, when you perturb any variable by a small amount x, the objective L2 loss becomes 0.5 * (true_average - current_average - 1/5 *x)^2, which is approximately 0.5 * (true_average - current_average)^2 - (true_average - current_average) * 1/5 * x, and give true_average - current_average = 5, the difference between the old and current L2 loss is x (if we ignore the sign), and thus the gradient (w.r.t. this specific variable) is 1 instead of 5

Please let me know what do you think about this.

dragonxlwang · 2016-05-11T07:25:19Z

Another line of argument is (in your variables average case), if we are considering linear combination of variables w1 * x1 + w2 * x2 ... + w5 * x5, and if not all w_i are equal, then it would be hard to generate your method of splitting anymore. However, multiplying each w_i would makes sense as it gives gradient direction.

gojomo · 2016-05-11T16:14:00Z

Reopening to discuss.

My main thought is: when model.cbow_mean is False, neu1e is not "w.r.t the average of context words' syn0". The word-vectors were summed during the forward-propagation. So neu1e is with-regard-to their sum, and so any reasoning which discusses the 'average' is wrong (and only applies to the case where model.cbow_mean is True).

I would suggest:

compose the patch that you believe corrects this calculation & test: does it create better results in that mode?
if you also see this bug in word2vec.c, post about it to the word2vec-toolkit forum (https://groups.google.com/forum/#!forum/word2vec-toolkit) so the users of that code can comment

dragonxlwang · 2016-05-11T16:25:34Z

Thanks for reopening.

While, what I said above is mainly for the case when model.cbow_mean is true.

If model.cbow_mean is false, however, currently the code would split the neu1e, however, IMHO, this is not needed (nor should).

Haven't played with gensim yet. I will try it and see if it makes a difference.

gojomo · 2016-09-07T18:17:58Z

(As just a note for others who come across this issue, I still believe there's no bug here – the code matches the gist of the algorithm and the behavior of the original word2vec.c implementation on which it was patterned.)

tmylk · 2016-09-28T09:52:32Z

Hi @dragonxlwang Did you get a response in the word2vec-toolkit forum on this issue?
We would like to keep the implementations in sync.

tmylk · 2016-10-05T06:59:23Z

Closing as "won't fix"

gojomo · 2021-11-15T17:11:45Z

See also #1873 for more discussion of this & related issues that come up from time to time.

gojomo closed this as completed May 11, 2016

gojomo reopened this May 11, 2016

tmylk closed this as completed Oct 5, 2016

gojomo mentioned this issue Feb 2, 2018

Is this a bug in the CBOW code or my misunderstanding? #1873

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cbow cbow_mean? #697

cbow cbow_mean? #697

dragonxlwang commented May 11, 2016

gojomo commented May 11, 2016

dragonxlwang commented May 11, 2016

dragonxlwang commented May 11, 2016

gojomo commented May 11, 2016

dragonxlwang commented May 11, 2016

gojomo commented Sep 7, 2016

tmylk commented Sep 28, 2016

tmylk commented Oct 5, 2016

gojomo commented Nov 15, 2021

cbow cbow_mean? #697

cbow cbow_mean? #697

Comments

dragonxlwang commented May 11, 2016

gojomo commented May 11, 2016

dragonxlwang commented May 11, 2016

dragonxlwang commented May 11, 2016

gojomo commented May 11, 2016

dragonxlwang commented May 11, 2016

gojomo commented Sep 7, 2016

tmylk commented Sep 28, 2016

tmylk commented Oct 5, 2016

gojomo commented Nov 15, 2021