-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements of UMAP/TSNE precomputed KNN feature #4865
Improvements of UMAP/TSNE precomputed KNN feature #4865
Conversation
@viclafargue Can we also do this for tSNE? |
We discussed this offline a little while back, but the user here is really requesting that we accept a pairwise distance matrix directly. While looking into the changes in this PR, I discovered that UMAP actually does support a knn graph as input now, however it's configured at the estimator level and not passed in on the |
ccc1b50
to
e3aa91b
Compare
This PR does the following :
Tried to keep C++ changes to a minimum, but some changes had to be made though. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good so far! Mostly minor things at this point.
python/cuml/manifold/t_sne.pyx
Outdated
@@ -199,6 +199,15 @@ class TSNE(Base, | |||
'sqeuclidean' metric, the distances will still be squared when True. | |||
Note: This argument should likely be set to False for distance metrics | |||
other than 'euclidean' and 'l2'. | |||
precomputed_knn : array / sparse array / tuple, optional (device or host) | |||
Either one of : | |||
- Tuple (distances, indices) of arrays of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider also supporting the knn_search_index
option (third tuple element) which is supported in the reference implementation: https://github.com/lmcinnes/umap/blob/3f19ce19584de4cf99e3d0ae779ba13a57472cd9/umap/umap_.py#L1626. We can push that feature off but we should at least create an issue for it to keep it on our radar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I think it would be better to work on this in a follow-up PR. Just opened an issue over here : #5118.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Thanks for adding this feature, @viclafargue!
Will merge pending conflicts/CI
Codecov ReportBase: 67.12% // Head: 67.26% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #4865 +/- ##
================================================
+ Coverage 67.12% 67.26% +0.13%
================================================
Files 192 192
Lines 12396 12394 -2
================================================
+ Hits 8321 8337 +16
+ Misses 4075 4057 -18
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
/merge |
~~The X input in `fit` and `fit_transform` functions is unnecessary when a KNN graph is provided. This PR adds a `precomputed` boolean parameter. It specifies whether X would serve as a classic input or as a precomputed KNN graph. Additional the `transform` function is modified so that it cannot take a KNN graph anymore.~~ This PR does the following : 1) Provides a `precomputed_knn` parameter to UMAP and tSNE constructor. It can be provided in the form of a : - tuple (distances, indices) - pairwise distance matrix of shape (n_samples, n_samples) - KNN graph in CSR/COO/CSC format 2) Makes the legacy `knn_graph` parameter of the UMAP and tSNE fit method capable of taking in all the forms aforementioned. The `knn_graph` parameter when provided would take precedence over the `precomputed_knn` parameter. 3) Removes `knn_graph` parameter from UMAP transform method as it wasn't actually doing anything. 4) Adds precomputed KNN capabilities for the UMAP fit method in the sparse case. Authors: - Victor Lafargue (https://github.com/viclafargue) - Dante Gama Dessavre (https://github.com/dantegd) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4865
The X input infit
andfit_transform
functions is unnecessary when a KNN graph is provided. This PR adds aprecomputed
boolean parameter. It specifies whether X would serve as a classic input or as a precomputed KNN graph. Additional thetransform
function is modified so that it cannot take a KNN graph anymore.This PR does the following :
precomputed_knn
parameter to UMAP and tSNE constructor.It can be provided in the form of a :
knn_graph
parameter of the UMAP and tSNE fit method capable of taking in all the forms aforementioned. Theknn_graph
parameter when provided would take precedence over theprecomputed_knn
parameter.knn_graph
parameter from UMAP transform method as it wasn't actually doing anything.