Bug in sklearn_api.hdp and in sklearn_api.ldamodel #1676
Labels
bug
Issue described a bug
difficulty easy
Easy issue: required small fix
good first issue
Issue for new contributors (not required gensim understanding + very simple)
Description
The new
sklearn_api.hdp
(and alsoldamodel
) modules and/or their combination withmatutils.Sparse2Corpus
yield the wrong results when fitting models from sklearn vectorizers or other sklearn-styled sparse matrices, since they are stored in CSR format. This error might occur in other sklearn_api classes, too.I believe that either the default value of Sparse2Corpus constructor's
documents_columns
parameter should be changed:or the following call should include that
documents_columns=False
:Steps/Code/Corpus to Reproduce
Example: The number of processed documents corresponds to the number of features.
Expected Results
We should expect that the hdp gensim model had processed the 100 documents in samples.
Actual Results
HDP processed 6547 documents
Workaround, for the record
To make it work correctly, the sparse matrix
x
should be transposed.Versions
The text was updated successfully, but these errors were encountered: