-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make termsim matrix positive definite even with negative similarities #2397
Conversation
Note that this set of parameters is unusual, since it makes sense to threshold the similarities at zero (that is why we are using a sparse matrix afterall), so although this bugfix changes the produced matrices, few users should be affected. |
Thank you for this PR. Unfortunately, I'm not very familiar with the underlying code, so it's difficult for me to comment on the substance of your PR. I do notice that it appears to be a bugfix. So, I have a few questions:
|
It is possible to unit-test this specific bug. This will be useful if someone reverts the fix in the future, or reimplements the
If this bug is not fixed, the term similarity matrix produced by the Gensim does not depend on the positive definiteness of a term similarity matrix at the moment. However, a user may produce word embeddings from the matrix using Cholesky factorization. This will fail for matrices that are not symmetric positive definite. As I noted above, this bug manifests itself rarely. The users affected by the bug will be mainly those who use grid search or a similar technique to optimize the |
Ok, could you please add a test to this PR? |
I have just added the unit test in c87aa31. |
2ba64c1
to
6424773
Compare
6424773
to
e83a552
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this looks good to me. Thank you for your contribution.
When a
SparseTermSimilarityMatrix
object is constructed with non-default parameters that satisfy the following constraints:symmetric and positive_definite and threshold < 0
, the matrix embedded in the constructed object may not be positive definite. This PR hardens the code against negative term similarities, making sure that the term similarity matrix is always positive definite.