Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compile time explosion for minkowski distance #1254

Merged

Conversation

ahendriksen
Copy link
Contributor

@ahendriksen ahendriksen commented Feb 7, 2023

As explained in #1246 (comment), ptxas chokes on the minkowski distance when VecLen==4 and IdxT==uint32_t.

This PR removes the veclen == 4 specialization for the minkowski distance.

Follow up to: #1239

@ahendriksen ahendriksen marked this pull request as ready for review February 7, 2023 14:24
@ahendriksen ahendriksen requested review from a team as code owners February 7, 2023 14:24
@ahendriksen ahendriksen added non-breaking Non-breaking change and removed CMake python ci labels Feb 7, 2023
@ahendriksen ahendriksen requested a review from cjnolet February 7, 2023 14:25
@ahendriksen ahendriksen added improvement Improvement / enhancement to an existing function ci Build Time Improvement labels Feb 7, 2023
@codecov-commenter
Copy link

codecov-commenter commented Feb 8, 2023

Codecov Report

Base: 87.99% // Head: 87.99% // No change to project coverage 👍

Coverage data is based on head (12dd68a) compared to base (bd0d86b).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-23.04    #1254   +/-   ##
=============================================
  Coverage         87.99%   87.99%           
=============================================
  Files                21       21           
  Lines               483      483           
=============================================
  Hits                425      425           
  Misses               58       58           
Impacted Files Coverage Δ
python/pylibraft/pylibraft/cluster/__init__.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how I missed this one originally but I'd like to keep the (modern) naming conventions more consistent across the APIs.

Also, I want to make sure the seemingly lower compile times aren't the result of sccache (which is suspect they are) and giving us false hope here. This is what happened before the release (we saw 45 minute compile times because of sccache, got excited, and then realized after the sccsche reset that the compile times were still taking >6 hours. I'd like to do some more profiling of this before merging.

cpp/include/raft/distance/masked_nn.cuh Outdated Show resolved Hide resolved
@ahendriksen
Copy link
Contributor Author

Also, I want to make sure the seemingly lower compile times aren't the result of sccache

Did you reset sccache in commit 585fc47 or do we still have to test this?

@cjnolet
Copy link
Member

cjnolet commented Feb 8, 2023

I tried to but that didn't seem to work. I'm going to build locally without sccache in the meantime since my workstation specs are similar to CI.

@cjnolet
Copy link
Member

cjnolet commented Feb 8, 2023

Here's the ninja trace log:

Screenshot from 2023-02-08 16-18-03

The trace above is from building RAFT on my local machine with the --allgpuarch flag.

Looks like the high offenders are now (in no particular order):

  1. linkage.cu googletest
  2. ivfpq_search runtime wrapper (this is weird because it means it's not properly using the specialization)
  3. All of the ivf-flat googletests
  4. All of the ivf-flat benchmarks
  5. eigen_solvers googletest
  6. refine specializations (expected)
  7. ball_cover_knn_query.cu specialization
  8. canberra specialization (expected.. but we should still figure this out)
  9. fused_l2_knn googletest

I notice a lot of the source files that take a long time to compile are taking in upwards of 1.5 hours or more.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going ahead and approving this guy because I think the build issues related to these changes have been fixed. The other build issues predate these changes, I believe.

@cjnolet
Copy link
Member

cjnolet commented Feb 9, 2023

/merge

@cjnolet
Copy link
Member

cjnolet commented Feb 9, 2023

@ahendriksen just leaving a note for later that we should pull the masked_l2_1nn_params out from masked_nn.cuh and put it in a file called mask_nn_types.hpp. This allows users to expose that through their own APIs (knowing it doesn't require nvcc). We can do this in a follow-on.

Copy link
Member

@sean-frye sean-frye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rapids-bot rapids-bot bot merged commit cd32d6f into rapidsai:branch-23.04 Feb 9, 2023
achirkin pushed a commit to achirkin/raft that referenced this pull request Feb 9, 2023
As explained in rapidsai#1246 (comment), ptxas chokes on the minkowski distance when `VecLen==4` and `IdxT==uint32_t`.

This PR removes the veclen == 4 specialization for the minkowski distance.

Follow up to: rapidsai#1239

Authors:
  - Allard Hendriksen (https://github.com/ahendriksen)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Sean Frye (https://github.com/sean-frye)

URL: rapidsai#1254
@ahendriksen ahendriksen deleted the fix-compile-time-explosion branch March 17, 2023 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Time Improvement ci CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python
Projects
Development

Successfully merging this pull request may close these issues.

4 participants