-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Sklearn wrapper for RandomProjections Model #1395
Changes from 7 commits
0c5bcb0
0810428
d67f047
a9ce401
05ad743
f1b9c4a
8696e54
fe2f947
7317173
692be88
a2ec746
954715e
6c3b819
aee04ff
a73dacc
da602d9
c1087ac
00f5336
376959d
9c888d6
373c36c
f3c3601
ab90b68
928c7f2
cf13c9a
cde12f2
26cd2df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (C) 2011 Radim Rehurek <[email protected]> | ||
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html | ||
# | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Code style: remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Updated now. |
||
""" | ||
Scikit learn interface for gensim for easy use of gensim with scikit-learn | ||
Follows scikit-learn API conventions | ||
""" | ||
from gensim import models | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Blank line before imports. Also, block the imports: built-in first, 3rd party second, local package imports last. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! Updated now. |
||
from gensim.sklearn_integration import base_sklearn_wrapper | ||
from sklearn.base import TransformerMixin, BaseEstimator | ||
|
||
|
||
class SklearnWrapperRpModel(models.RpModel, base_sklearn_wrapper.BaseSklearnWrapper, TransformerMixin, BaseEstimator): | ||
""" | ||
Base RP module | ||
""" | ||
|
||
def __init__(self, corpus, id2word=None, num_topics=300): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please remove |
||
""" | ||
Sklearn wrapper for RP model. Class derived from gensim.models.RpModel. | ||
""" | ||
self.corpus = corpus | ||
self.id2word = id2word | ||
self.num_topics = num_topics | ||
|
||
def get_params(self, deep=True): | ||
""" | ||
Returns all parameters as dictionary. | ||
""" | ||
return {"corpus": self.corpus, "id2word": self.id2word, "num_topics": self.num_topics} | ||
|
||
def set_params(self, **parameters): | ||
""" | ||
Set all parameters. | ||
""" | ||
super(SklearnWrapperRpModel, self).set_params(**parameters) | ||
|
||
def fit(self, X, y=None): | ||
""" | ||
For fitting corpus into class object. | ||
Calls gensim.models.RpModel | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replace doc-string to |
||
>>>gensim.models.RpModel(corpus=self.corpus, id2word=self.id2word, num_topics=self.num_topics) | ||
""" | ||
super(SklearnWrapperRpModel, self).__init__(corpus=self.corpus, id2word=self.id2word, num_topics=self.num_topics) | ||
|
||
def transform(self, doc): | ||
""" | ||
Take document/corpus as input. | ||
Return RP representation of the input document/corpus. | ||
""" | ||
return self[doc] | ||
|
||
def partial_fit(self, X): | ||
raise NotImplementedError("'partial_fit' has not been implemented for the RandomProjections model") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation line doesn't seem to help -- what are these undefined variables like
id2word
,chunksize
etc?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These params (
id2word
,chunksize
etc) are associated with the LSI model. This change is in the filesklearn_wrapper_gensim_lsimodel.py
. Since this change was so small (literally one word in a docstring), I added this change in this PR (PR concerning RP model wrapper) itself.There is also a similar change for LDA model here. Should I remove these changes from this PR?