-
Notifications
You must be signed in to change notification settings - Fork 546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TSNE and UMAP allow several distance types #4779
TSNE and UMAP allow several distance types #4779
Conversation
Linking #1799. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes are looking good. Mostly minor things in my review. It looks like there's a failed tsne gtest, which is likely from these changes.
python/cuml/manifold/umap.pyx
Outdated
Metrics that take arguments (such as minkowski) can have arguments | ||
passed via the metric_kwds dictionary. At this time care must | ||
be taken and dictionary elements must be ordered appropriately; | ||
this will hopefully be fixed in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this reads better without the word "hopefully"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjnolet this documentation was borrowed directly from UMAP, which allows more than one metric kwds arguments. Currently, we just have the minkowski p value as the only parameter. So I believe we can simply get rid of the line "At this time..."
metric : str 'euclidean' only (default 'euclidean') | ||
Currently only supports euclidean distance. Will support cosine in | ||
a future release. | ||
metric : str (default='euclidean'). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a disclaimer here and explicitly point out the square_distances argument. The math in the base TSNE algorithm itself assumes the distances can be squared (eg that Euclidean is used by default, which then becomes sqeuclidean) during the loss computation. We want to make sure users know that if they are using a different distance, they will likely want to set the square_distance argument to false.
We probably want to document this but also provide a warning when a distance other Euclidean is used so that the users know to turn it off. We probably also want to turn this off in the pytests for all distances other than Euclidean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…-umap-user-configured-metric
rerun tests |
…-umap-user-configured-metric
c99f104
to
43b2de5
Compare
Close in favor of #4851 |
…ap-user-configured-metric
rerun tests |
…ap-user-configured-metric
Codecov Report
@@ Coverage Diff @@
## branch-22.10 #4779 +/- ##
=============================================
Coverage 78.02% 78.02%
=============================================
Files 180 180
Lines 11385 11385
=============================================
Hits 8883 8883
Misses 2502 2502
Flags with carried forward coverage won't be shown. Click here to find out more. Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
@gpucibot merge |
- [x] TSNE allow different distance metrics to be passed to KNN - [x] TSNE distance metric pytests - [x] UMAP allow different distance metrics to be passed to KNN - [x] UMAP distance metric pytests closes rapidsai#1653 Authors: - Tarang Jain (https://github.com/tarang-jain) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#4779
closes UMAP & T-SNE to pass user-configured metrics to KNN #1653