Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse TSNE #3293

Merged
merged 25 commits into from
Jan 7, 2021
Merged

Sparse TSNE #3293

merged 25 commits into from
Jan 7, 2021

Conversation

divyegala
Copy link
Member

@divyegala divyegala commented Dec 11, 2020

This PR allows TSNE to accept sparse inputs.

It also removes long-standing warnings ptxas warning : Value of threads per SM for entry _ZN2ML4TSNE17IntegrationKernelEfffPfS1_PKfS3_S3_S3_S1_S1_S1_S1_S3_i is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN2ML4TSNE15RepulsionKernelEffPKiS2_PKfS4_S4_PfS5_S5_fiiiS4_S2_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN2ML4TSNE18TreeBuildingKernelEPiPKfS3_iiS1_S1_S3_ is out of range. .minnctapersm will be ignored ptxas warning : Value of threads per SM for entry _ZN2ML4TSNE17BoundingBoxKernelEPiS1_PfS2_S2_S2_S2_S2_S2_iiiPjS2_ is out of range. .minnctapersm will be ignored from cuml builds which were caused by invalid parameters to __launch_bounds__ in TSNE kernels.

Furthermore, I also created a class TSNE_runner to handle running separate components of the algorithm as well as to ensure the proper use of RAII buffers and their de-allocation once their use is done, without explicitly deleting those buffers.

closes #2751

@divyegala divyegala requested review from a team as code owners December 11, 2020 06:42
@divyegala divyegala marked this pull request as draft December 11, 2020 06:42
@divyegala divyegala added 2 - In Progress Currenty a work in progress CUDA / C++ CUDA issue Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change labels Dec 11, 2020
@codecov-io
Copy link

codecov-io commented Dec 11, 2020

Codecov Report

Merging #3293 (383eca4) into branch-0.18 (ae7e444) will increase coverage by 0.06%.
The diff coverage is 98.41%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #3293      +/-   ##
===============================================
+ Coverage        71.48%   71.55%   +0.06%     
===============================================
  Files              207      207              
  Lines            16750    16787      +37     
===============================================
+ Hits             11974    12012      +38     
+ Misses            4776     4775       -1     
Impacted Files Coverage Δ
python/cuml/manifold/t_sne.pyx 79.42% <98.30%> (+3.34%) ⬆️
python/cuml/common/sparsefuncs.py 91.95% <100.00%> (+0.28%) ⬆️
...l/_thirdparty/sklearn/preprocessing/_imputation.py 62.50% <0.00%> (+0.40%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ae7e444...383eca4. Read the comment docs.

@divyegala
Copy link
Member Author

rerun tests

@divyegala divyegala marked this pull request as ready for review December 11, 2020 19:26
@divyegala divyegala added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currenty a work in progress labels Dec 11, 2020
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really glad to see this change coming in!

Most of the feedback is minor, however adding this feature required that I also parametrize the remaining functions in UMAP so that updating from int64_t and float is straightforward. We should use a parametrized type instead of value_t where possible.

cpp/src/tsne/bh_kernels.cuh Outdated Show resolved Hide resolved
cpp/src/tsne/bh_kernels.cuh Outdated Show resolved Hide resolved
cpp/src/tsne/tsne_runner.cuh Show resolved Hide resolved
cpp/src/tsne/distances.cuh Outdated Show resolved Hide resolved
cpp/src/tsne/tsne_runner.cuh Outdated Show resolved Hide resolved
cpp/src_prims/sparse/coo.cuh Outdated Show resolved Hide resolved
cpp/src/tsne/bh_kernels.cuh Outdated Show resolved Hide resolved
@divyegala divyegala added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 3 - Ready for Review Ready for review by team labels Dec 13, 2020
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good and it's almost there. My main concern is that I don't think all the non-pointer arguments need should be coupled to the 64-bit template types.

cpp/src/tsne/bh_kernels.cuh Outdated Show resolved Hide resolved
cpp/src/tsne/bh_kernels.cuh Outdated Show resolved Hide resolved
cpp/src/tsne/distances.cuh Outdated Show resolved Hide resolved
cpp/src/tsne/exact_kernels.cuh Show resolved Hide resolved
cpp/src/tsne/tsne_runner.cuh Show resolved Hide resolved
@divyegala
Copy link
Member Author

@cjnolet leaving this comment as a reference for post-vacation. I updated this PR with your latest review feedback

@divyegala
Copy link
Member Author

rerun tests

@@ -208,14 +208,16 @@ def extract_knn_graph(knn_graph, convert_dtype=True):
knn_indices = knn_graph.col

if knn_indices is not None:
convert_to_dtype = None
if convert_dtype:
convert_to_dtype = np.int32 if sparse else np.int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be important to change this when FAISS is updated (and the indices are 32-bit). Referencing relevant issue: #2821

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@cjnolet cjnolet added 6 - Okay to Auto-Merge and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Jan 7, 2021
@rapids-bot rapids-bot bot merged commit 4d2de05 into rapidsai:branch-0.18 Jan 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA / C++ CUDA issue Cython / Python Cython or Python issue feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Sparse input support for tSNE
3 participants