Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create cub-based argmin primitive and replace argmin_along_rows in ANN kmeans #912

Merged
merged 6 commits into from
Nov 9, 2022

Conversation

Nyrio
Copy link
Contributor

@Nyrio Nyrio commented Oct 11, 2022

This PR follows up on a suggestion from @cjnolet. The new argmin primitive is up to 5x faster than argmin_along_rows for dimensions relevant to ANN kmeans, and removes code duplication.

The reasons why it is faster are:

  • argmin_along_rows often misses on doing a sequential reduction before the tree reduction, especially as it uses large block sizes, as much as 1024.
  • CUB has a better reduction algorithm than the basic shared-mem reduction used in argmin_along_rows.
  • If we switch the argmin prim to using the cub::BLOCK_REDUCE_RAKING_COMMUTATIVE_ONLY algorithm, we can get up to 30% further speedup! (I believe it's safe to use the commutative algorithm here since the offset is contained in the key-value pair so the reduction operation is commutative).

The speedup that I have measured for IVF-Flat build with the InnerProduct metric is around 15%.

@Nyrio Nyrio requested review from a team as code owners October 11, 2022 17:05
@Nyrio Nyrio added 3 - Ready for Review improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Oct 11, 2022
Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes, @Nyrio! Definitely happy to see prims for both argmin and argmax and more consolidation being done on the new ANN algos. Mostly minor things.

cpp/include/raft/matrix/math.cuh Show resolved Hide resolved
cpp/include/raft/matrix/math.cuh Show resolved Hide resolved
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Louis for the PR, I just have a few small comments.

cpp/bench/matrix/argmin.cu Outdated Show resolved Hide resolved
cpp/include/raft/matrix/math.cuh Outdated Show resolved Hide resolved
cpp/include/raft/matrix/math.cuh Outdated Show resolved Hide resolved
@tfeher tfeher mentioned this pull request Oct 27, 2022
10 tasks
@Nyrio Nyrio added 2 - In Progress Currenty a work in progress and removed 3 - Ready for Review labels Oct 27, 2022
@Nyrio
Copy link
Contributor Author

Nyrio commented Oct 31, 2022

@lowener @cjnolet The new argmax header was slightly wrong, and the test was incorrect as well (see explanation below). Please have a detailed look at my latest change to see if you agree with the way I solved this.

The wrapper takes a row-major matrix view, but the implementation is in column-major semantics. As a result:

  • The documentation of the wrapper was incorrect. In row-major semantics, we're finding the column with the minimum index for each row (in column-major it's the other way around).
  • The dimension of the output was also incorrect. The wrapper checked that the size of the output was the same as the number of rows of the matrix, but the size of the output is N, which here was the number of columns. Note that I switched the arguments D and N passed to the implementation.
  • The test passed mostly by chance, because it created a 4x3 row-major matrix, checking against the expected outputs for a 3x4 row-major matrix, but the errors in the test and wrapper canceled each other. Except that it was checking 4 rows in the devArrMatch despite both the actual and expected output only having 3 rows.

I've kept the column-major convention in the deprecated header, fixed the new ones for the row-major convention, and switched the implementation to drop row and columns in favor of N (number of reductions) and D (elements to reduce) and documented that.

@Nyrio Nyrio requested review from cjnolet and tfeher October 31, 2022 12:43
@Nyrio Nyrio removed their assignment Nov 2, 2022
@Nyrio Nyrio added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 2 - In Progress Currenty a work in progress labels Nov 2, 2022
Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, your changes on argmax make sense for me!

@Nyrio Nyrio assigned tfeher and unassigned lowener Nov 8, 2022
@Nyrio
Copy link
Contributor Author

Nyrio commented Nov 8, 2022

@tfeher and @cjnolet Can you update your review status? Thanks!

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing the intermediate issues as well!

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Louis for addressing the issues, it looks good to me!

@Nyrio Nyrio added 5 - Ready to Merge and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Nov 9, 2022
@cjnolet
Copy link
Member

cjnolet commented Nov 9, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 836bb58 into rapidsai:branch-22.12 Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

4 participants