-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gensim doesn't allow changing negative sampling distribution parameter #2090
Comments
Thanks for the pointer to that interesting paper! This sounds like an small & low-risk/high-benefit change, so a PR would be welcome. |
Hello, @gojomo. I'm preparing a PR, but I'm having some trouble with some tests falling. The following tests failed:
Both had the same error:
I guess it's because of the new attribute I've placed in the Word2Vec. What am I supposed to do to make it compatible with old saved models? Here is my branch: |
Feel free to create the PR from your branch to gensim/develop even before it's fully ready - that'll make it easier for others to see the changes, and to see the unit-test results in the project's own continuous-integration setup. You can mark it "[WIP]" to be clear it's a Work-In-Progress. In general, if a class gets a new (necessary) parameter, older saved (python pickled) objects of that type will need to be patched-up, upon The various "if" fixups there either rebuild things that intentionally weren't saved (because they can be fully rebuilt from other saved state), or path-up missing state that newer source requires and older source didn't save. I'm not sure why missing this parameter creates exactly the error you've reported – maybe an earlier error is being silently suppressed – but one of the Without a full review yet, a few other thoughts:
|
@gojomo |
There definitely is, I don't have the syntax handy, but IIRC it involves invoking the test_*.py file itself with command-line arguments naming the exact test(s)/test-suites to run. |
@gojomo, I fixed the problems and created the PR. |
Description
Like pointed out in the following article, the negative sampling distribution parameter, which is fixed as 0.75 in Gensim, is worth tuning, specially for other applications beyond NLP. So, I'd be very helpful to make it a parameter for the Word2Vec, instead of fixing it.
https://arxiv.org/abs/1804.04212
The text was updated successfully, but these errors were encountered: