Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cbow cbow_mean? #697

Closed
dragonxlwang opened this issue May 11, 2016 · 9 comments
Closed

cbow cbow_mean? #697

dragonxlwang opened this issue May 11, 2016 · 9 comments

Comments

@dragonxlwang
Copy link

The code Here:

 if not model.cbow_mean and input_word_indices:
            neu1e /= len(input_word_indices)

Why average when not model.cbow? I thought this should be model.cbow_mean instead.

Appreciate if someone can help me out.

@gojomo
Copy link
Collaborator

gojomo commented May 11, 2016

This is the backprop step; in order for the correction to the summed vectors, in the non-mean case, to net-total to the right number, the error must be split over all the constituent summands. See also this recent thread: https://groups.google.com/d/msg/gensim/BtN7uB1vgpc/tlvkLXqzJwAJ (which would be a better place for further discussion/questions that aren't bugs/feature-requests).

@gojomo gojomo closed this as completed May 11, 2016
@dragonxlwang
Copy link
Author

Hi, @gojomo Thanks for pointing the thread.

I believe that this is a bug in the code (at least looks to me), which also appears in the word2vec code (actually at first I found suspicious about the Milokov's word2vec code, then I thought it might help if I check gensim's implementation...

Since I think this is a bug, I put my reason below here (sorry if you think I am abusing github):

  1. neu1e (as in Ln290 is the objective derivative w.r.t the average of context words' syn0 . Do you agree with me?
  2. According to chain rule, the objective derivative w.r.t. one context word's syn0 is thus the above neu1e times it's weight in the average (which is 1 / len).

You argument in google group was mainly about match the residual. However, residual happens to be the gradient of L2 loss. In word2vec, nevertheless, the objective is different.

In addition, even in your case of L2 loss (residual match) in the google group, when you perturb any variable by a small amount x, the objective L2 loss becomes 0.5 * (true_average - current_average - 1/5 *x)^2, which is approximately 0.5 * (true_average - current_average)^2 - (true_average - current_average) * 1/5 * x, and give true_average - current_average = 5, the difference between the old and current L2 loss is x (if we ignore the sign), and thus the gradient (w.r.t. this specific variable) is 1 instead of 5

Please let me know what do you think about this.

@dragonxlwang
Copy link
Author

Another line of argument is (in your variables average case), if we are considering linear combination of variables w1 * x1 + w2 * x2 ... + w5 * x5, and if not all w_i are equal, then it would be hard to generate your method of splitting anymore. However, multiplying each w_i would makes sense as it gives gradient direction.

@gojomo
Copy link
Collaborator

gojomo commented May 11, 2016

Reopening to discuss.

My main thought is: when model.cbow_mean is False, neu1e is not "w.r.t the average of context words' syn0". The word-vectors were summed during the forward-propagation. So neu1e is with-regard-to their sum, and so any reasoning which discusses the 'average' is wrong (and only applies to the case where model.cbow_mean is True).

I would suggest:

  • compose the patch that you believe corrects this calculation & test: does it create better results in that mode?
  • if you also see this bug in word2vec.c, post about it to the word2vec-toolkit forum (https://groups.google.com/forum/#!forum/word2vec-toolkit) so the users of that code can comment

@gojomo gojomo reopened this May 11, 2016
@dragonxlwang
Copy link
Author

Thanks for reopening.

While, what I said above is mainly for the case when model.cbow_mean is true.

If model.cbow_mean is false, however, currently the code would split the neu1e, however, IMHO, this is not needed (nor should).

Haven't played with gensim yet. I will try it and see if it makes a difference.

@gojomo
Copy link
Collaborator

gojomo commented Sep 7, 2016

(As just a note for others who come across this issue, I still believe there's no bug here – the code matches the gist of the algorithm and the behavior of the original word2vec.c implementation on which it was patterned.)

@tmylk
Copy link
Contributor

tmylk commented Sep 28, 2016

Hi @dragonxlwang Did you get a response in the word2vec-toolkit forum on this issue?
We would like to keep the implementations in sync.

@tmylk
Copy link
Contributor

tmylk commented Oct 5, 2016

Closing as "won't fix"

@gojomo
Copy link
Collaborator

gojomo commented Nov 15, 2021

See also #1873 for more discussion of this & related issues that come up from time to time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants