-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scoring function in Phrases model is hardcoded #1635
Comments
@michaelwsherman thoughts? |
I have no opinion on which should merge first. |
I'm of the opinion that #1573 should be merged first :), but I'll wait for #1568 -- just be aware that it could easily be a few weeks after #1568 until I merge the code--looks like it will be a meaty merge. If it's important that the merge happen quickly then maybe #1573 should merge first since #1568 looks more active right now. But happy to make whatever work if y'all don't mind the wait. |
Ok, let's merge #1573 first, thanks for clarification @michaelwsherman. |
* initial commit of fixes in comments of #1423 * removed unnecessary space in logger * added support for custom Phrases scorers * fixed Phrases.__getitem__ to support pluggable scoring #1533 * travisCI style fixes * fixed __next__() to next() for python 3 compatibilyt * misc fixes * spacing fixes for style * custom scorer support in sklearn api * Phrases scikit interface tests for pluggable scoring * missing line breaks * style, clarity, and robustness fixes requested by @piskvorky * check in Phrases init to make sure scorer is pickleable * backwards scoring compatibility when loading a Phrases class * removal of pickle testing objects in Phrases init * switched to six for python 2/3 compatibility * fix docstring
Resolved in #1573 |
…iskvorky#1573) * initial commit of fixes in comments of piskvorky#1423 * removed unnecessary space in logger * added support for custom Phrases scorers * fixed Phrases.__getitem__ to support pluggable scoring piskvorky#1533 * travisCI style fixes * fixed __next__() to next() for python 3 compatibilyt * misc fixes * spacing fixes for style * custom scorer support in sklearn api * Phrases scikit interface tests for pluggable scoring * missing line breaks * style, clarity, and robustness fixes requested by @piskvorky * check in Phrases init to make sure scorer is pickleable * backwards scoring compatibility when loading a Phrases class * removal of pickle testing objects in Phrases init * switched to six for python 2/3 compatibility * fix docstring
The Phrases model is based on word counting and bigram counting and it can process sentences by a given scoring function, which can be supplied via the construtor of
Phrases
(the parameterscoring
). However, the field for scoring function is used only in theexport_phrases
method. Have a look here:https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/phrases.py#L269
and here:
https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/phrases.py#L284
However, in the
__getitem__
method, the scoring uses the default scoring always https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/phrases.py#L334 :This looks like a bug to me (we are always using the default scoring, even if we explicitly stated npmi in the constructor). Is it okay if I open a pull request fixing this one?
The text was updated successfully, but these errors were encountered: