Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSNE and UMAP allow several distance types #4779

Conversation

tarang-jain
Copy link
Contributor

@tarang-jain tarang-jain commented Jun 15, 2022

@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Jun 15, 2022
@tarang-jain tarang-jain changed the title TSNE allow distance types TSNE and UMAP allow several distance types Jun 23, 2022
@lowener
Copy link
Contributor

lowener commented Jun 23, 2022

Linking #1799.

@cjnolet
Copy link
Member

cjnolet commented Jun 23, 2022

@lowener, that issue is a little different than this one. This PR actually closes #1653.

@tarang-jain tarang-jain marked this pull request as ready for review June 23, 2022 23:59
@tarang-jain tarang-jain requested review from a team as code owners June 23, 2022 23:59
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are looking good. Mostly minor things in my review. It looks like there's a failed tsne gtest, which is likely from these changes.

Metrics that take arguments (such as minkowski) can have arguments
passed via the metric_kwds dictionary. At this time care must
be taken and dictionary elements must be ordered appropriately;
this will hopefully be fixed in the future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this reads better without the word "hopefully"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjnolet this documentation was borrowed directly from UMAP, which allows more than one metric kwds arguments. Currently, we just have the minkowski p value as the only parameter. So I believe we can simply get rid of the line "At this time..."

metric : str 'euclidean' only (default 'euclidean')
Currently only supports euclidean distance. Will support cosine in
a future release.
metric : str (default='euclidean').
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a disclaimer here and explicitly point out the square_distances argument. The math in the base TSNE algorithm itself assumes the distances can be squared (eg that Euclidean is used by default, which then becomes sqeuclidean) during the loss computation. We want to make sure users know that if they are using a different distance, they will likely want to set the square_distance argument to false.

We probably want to document this but also provide a warning when a distance other Euclidean is used so that the users know to turn it off. We probably also want to turn this off in the pytests for all distances other than Euclidean.

Copy link
Contributor Author

@tarang-jain tarang-jain Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjnolet sklearn have deprecated their square_distances argument and in the docs, they state that distances are always squared. Should we do something similar?

@cjnolet
Copy link
Member

cjnolet commented Jul 14, 2022

rerun tests

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jul 14, 2022
@tarang-jain tarang-jain force-pushed the fea-tsne-umap-user-configured-metric branch from c99f104 to 43b2de5 Compare August 3, 2022 23:42
@tarang-jain
Copy link
Contributor Author

Close in favor of #4851

@tarang-jain tarang-jain closed this Aug 4, 2022
@tarang-jain tarang-jain reopened this Aug 4, 2022
@tarang-jain tarang-jain changed the base branch from branch-22.08 to branch-22.10 August 4, 2022 15:52
@tarang-jain tarang-jain requested review from a team as code owners August 4, 2022 15:52
@github-actions github-actions bot added CMake conda conda issue gpuCI gpuCI issue labels Aug 4, 2022
@cjnolet
Copy link
Member

cjnolet commented Aug 4, 2022

rerun tests

@github-actions github-actions bot removed conda conda issue gpuCI gpuCI issue CMake labels Aug 9, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #4779 (5a13a36) into branch-22.10 (7a0ab85) will not change coverage.
The diff coverage is n/a.

@@              Coverage Diff              @@
##           branch-22.10    #4779   +/-   ##
=============================================
  Coverage         78.02%   78.02%           
=============================================
  Files               180      180           
  Lines             11385    11385           
=============================================
  Hits               8883     8883           
  Misses             2502     2502           
Flag Coverage Δ
dask 46.21% <ø> (ø)
non-dask 67.27% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@cjnolet cjnolet removed the request for review from a team August 9, 2022 19:50
@cjnolet
Copy link
Member

cjnolet commented Aug 9, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 56590f2 into rapidsai:branch-22.10 Aug 9, 2022
jakirkham pushed a commit to jakirkham/cuml that referenced this pull request Feb 27, 2023
- [x] TSNE allow different distance metrics to be passed to KNN
- [x] TSNE distance metric pytests
- [x] UMAP allow different distance metrics to be passed to KNN
- [x] UMAP distance metric pytests
closes rapidsai#1653

Authors:
  - Tarang Jain (https://github.com/tarang-jain)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#4779
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UMAP & T-SNE to pass user-configured metrics to KNN
4 participants