Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Index template parameter from distance specializations #1220

Closed

Conversation

benfred
Copy link
Member

@benfred benfred commented Feb 1, 2023

Remove the index template parameter from the pairwise distance specializations. Since this is only used for scalar values with the dimensionality of the input/output matrices - having multiple different types here doesn't benefit us much, and just causes us to compile more distance specializations (or miss them in the case of int64_t indices).

This just uses a size_t for the m/n/k parameters for the distance metrics. Since int and uint32_t types can be implicitly converted to size_t - and we are only updating the detail namespace and not the public api -
this isn't a breaking change

Remove the index template parameter from the pairwise distance
specializations. Since this is only used for scalar values with
the dimensionality of the input/output matrices - having multiple
different types here doesn't benefit us much, and just causes us
to compile more distance specializations (or miss them
in the case of int64_t indices).
@benfred benfred requested review from a team as code owners February 1, 2023 06:44
@benfred benfred added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 1, 2023
@benfred benfred marked this pull request as draft February 1, 2023 06:44
@benfred benfred mentioned this pull request Feb 1, 2023
@codecov-commenter
Copy link

Codecov Report

Base: 87.99% // Head: 87.99% // No change to project coverage 👍

Coverage data is based on head (5d6bc6b) compared to base (dc4e8f0).
Patch has no changes to coverable lines.

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-23.02    #1220   +/-   ##
=============================================
  Coverage         87.99%   87.99%           
=============================================
  Files                21       21           
  Lines               483      483           
=============================================
  Hits                425      425           
  Misses               58       58           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@benfred benfred marked this pull request as ready for review February 1, 2023 16:36
@benfred
Copy link
Member Author

benfred commented Feb 1, 2023

This PR seems to significantly improve the build times on my dev box.

Running a clean build on my dev box with time ./build.sh libraft pylibraft tests --compile-libs on this branch reports:

real    36m13.881s
user    359m3.551s
sys     6m30.971s

Compared to the branch-23.02 clean build times of

real    51m17.140s
user    451m32.702s
sys     6m40.730s

(Note that in branch-23.02 - I'm seeing the lp_unexpanded_float_float_float_uint32.cu specialization take almost 50 minutes to compile on its own for reasons that I can't quite figure out, but that doesn't seem to happen with this PR).

@benfred benfred changed the base branch from branch-23.02 to branch-23.04 February 1, 2023 21:14
@ahendriksen
Copy link
Contributor

Does the pairwiseDistanceMatKernel get instantiated with size_t as the IdxT parameter? Do you know what the impact is on runtime? Using 64bit indexing can have a non-negligible impact on performance in CUDA kernels.

@@ -252,65 +252,64 @@ void pairwise_distance(raft::device_resources const& handle,
{
switch (metric) {
case raft::distance::DistanceType::L2Expanded:
detail::pairwise_distance_impl<Type, Index_, raft::distance::DistanceType::L2Expanded>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do really like the idea of removing this remplate argument altogether because it completely consolidates the number of possible specializations. I wonder, though, is there a way we can be smart about the index precision without necessarily having to force everything to be 64-bit? I can imagine there's only a select few places that would actually require 64-bit indexing to be used, right? Such as when we need to multiply m*n in order to get the index into the output tiles? Otherwise, I can't imagine either m or n being into the billions. 10's of millions maybe (which is why their product could overflow in 32-bit).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @ahendriksen as the three of us were discussing this yesterday

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently when instantiated with IdxT=int, the pairwise_distance kernels are intended to be correct when

  • n, m, k are less than 2 billion (i.e., representable by positive signed int).
  • The input arrays have less than 2 billion elements. The indexing here is prone to overflow.
  • The output array can be any int64_t indexable size, as writing to the output array is done with care.

There are a couple of options to consolidate the index types:

  1. Standardize on typeof(m, n, k) = int and use careful indexing on the input arrays, so that we can support input arrays with more than 2B elements. Throw a runtime error when n, m, k > 2B.
  2. Standardize on typeof(m, n, k) = int and use careful indexing on the input arrays, so that we can support input arrays with more than 2B elements. In addition, support int64 strides. At runtime, when one of n, m, or k > 2B, we can batch the calls to the pairwise distance kernel with appropriate subsets of the input/output arrays (changing base pointer, m, n, k and strides).
  3. Standardize on typeof(m, n, k) = int64.

I expect that option 1 will have a negligible impact to performance. Option 2 might have a slightly bigger impact and option 3 will have up to 15% performance degradation.

My preference would be option 1, but I am not sure if that is feasible. On an 80GB card, we can store 20B floats. So values of m, n, k for which we throw a run time error could be for instance:

  • k=1K, m=3M, n=1K: leading to x= m x k = 3B, y= n x k = 1M, out= m x n = 3B elements.

@benfred
Copy link
Member Author

benfred commented Feb 7, 2023

I'm closing this PR - and instead just using the int32 specializations in the bfknn replacement e870eb3 .

@benfred benfred closed this Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Time Improvement CMake cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

4 participants