Remove Index template parameter from distance specializations #1220

benfred · 2023-02-01T06:44:26Z

Remove the index template parameter from the pairwise distance specializations. Since this is only used for scalar values with the dimensionality of the input/output matrices - having multiple different types here doesn't benefit us much, and just causes us to compile more distance specializations (or miss them in the case of int64_t indices).

This just uses a size_t for the m/n/k parameters for the distance metrics. Since int and uint32_t types can be implicitly converted to size_t - and we are only updating the detail namespace and not the public api -
this isn't a breaking change

Remove the index template parameter from the pairwise distance specializations. Since this is only used for scalar values with the dimensionality of the input/output matrices - having multiple different types here doesn't benefit us much, and just causes us to compile more distance specializations (or miss them in the case of int64_t indices).

codecov-commenter · 2023-02-01T12:57:06Z

Codecov Report

Base: 87.99% // Head: 87.99% // No change to project coverage 👍

Coverage data is based on head (5d6bc6b) compared to base (dc4e8f0).
Patch has no changes to coverable lines.

Additional details and impacted files

@@              Coverage Diff              @@
##           branch-23.02    #1220   +/-   ##
=============================================
  Coverage         87.99%   87.99%           
=============================================
  Files                21       21           
  Lines               483      483           
=============================================
  Hits                425      425           
  Misses               58       58

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

benfred · 2023-02-01T17:08:06Z

This PR seems to significantly improve the build times on my dev box.

Running a clean build on my dev box with time ./build.sh libraft pylibraft tests --compile-libs on this branch reports:

real    36m13.881s
user    359m3.551s
sys     6m30.971s

Compared to the branch-23.02 clean build times of

real    51m17.140s
user    451m32.702s
sys     6m40.730s

(Note that in branch-23.02 - I'm seeing the lp_unexpanded_float_float_float_uint32.cu specialization take almost 50 minutes to compile on its own for reasons that I can't quite figure out, but that doesn't seem to happen with this PR).

ahendriksen · 2023-02-02T17:26:57Z

Does the pairwiseDistanceMatKernel get instantiated with size_t as the IdxT parameter? Do you know what the impact is on runtime? Using 64bit indexing can have a non-negligible impact on performance in CUDA kernels.

cjnolet · 2023-02-03T16:08:43Z

cpp/include/raft/distance/distance.cuh

@@ -252,65 +252,64 @@ void pairwise_distance(raft::device_resources const& handle,
 {
  switch (metric) {
    case raft::distance::DistanceType::L2Expanded:
-      detail::pairwise_distance_impl<Type, Index_, raft::distance::DistanceType::L2Expanded>(


I do really like the idea of removing this remplate argument altogether because it completely consolidates the number of possible specializations. I wonder, though, is there a way we can be smart about the index precision without necessarily having to force everything to be 64-bit? I can imagine there's only a select few places that would actually require 64-bit indexing to be used, right? Such as when we need to multiply m*n in order to get the index into the output tiles? Otherwise, I can't imagine either m or n being into the billions. 10's of millions maybe (which is why their product could overflow in 32-bit).

cc @ahendriksen as the three of us were discussing this yesterday

Currently when instantiated with IdxT=int, the pairwise_distance kernels are intended to be correct when

n, m, k are less than 2 billion (i.e., representable by positive signed int).

The input arrays have less than 2 billion elements. The indexing here is prone to overflow.

The output array can be any int64_t indexable size, as writing to the output array is done with care.

There are a couple of options to consolidate the index types:

Standardize on typeof(m, n, k) = int and use careful indexing on the input arrays, so that we can support input arrays with more than 2B elements. Throw a runtime error when n, m, k > 2B.

Standardize on typeof(m, n, k) = int and use careful indexing on the input arrays, so that we can support input arrays with more than 2B elements. In addition, support int64 strides. At runtime, when one of n, m, or k > 2B, we can batch the calls to the pairwise distance kernel with appropriate subsets of the input/output arrays (changing base pointer, m, n, k and strides).

Standardize on typeof(m, n, k) = int64.

I expect that option 1 will have a negligible impact to performance. Option 2 might have a slightly bigger impact and option 3 will have up to 15% performance degradation.

My preference would be option 1, but I am not sure if that is feasible. On an 80GB card, we can store 20B floats. So values of m, n, k for which we throw a run time error could be for instance:

k=1K, m=3M, n=1K: leading to x= m x k = 3B, y= n x k = 1M, out= m x n = 3B elements.

benfred · 2023-02-07T22:11:03Z

I'm closing this PR - and instead just using the int32 specializations in the bfknn replacement e870eb3 .

benfred requested review from a team as code owners February 1, 2023 06:44

benfred added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 1, 2023

github-actions bot added CMake cpp labels Feb 1, 2023

benfred marked this pull request as draft February 1, 2023 06:44

benfred mentioned this pull request Feb 1, 2023

Replace faiss bfKnn #1202

Merged

benfred marked this pull request as ready for review February 1, 2023 16:36

benfred changed the base branch from branch-23.02 to branch-23.04 February 1, 2023 21:14

cjnolet assigned benfred Feb 1, 2023

Merge branch 'branch-23.04' into no_pairwise_index_template

d165c65

cjnolet mentioned this pull request Feb 3, 2023

[FEA] Investigating build times and number of specializations #1201

Open

cjnolet added the Build Time Improvement label Feb 3, 2023

cjnolet reviewed Feb 3, 2023

View reviewed changes

benfred closed this Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove Index template parameter from distance specializations #1220

Remove Index template parameter from distance specializations #1220

benfred commented Feb 1, 2023 •

edited

Loading

codecov-commenter commented Feb 1, 2023

benfred commented Feb 1, 2023

ahendriksen commented Feb 2, 2023

cjnolet Feb 3, 2023

cjnolet Feb 3, 2023

ahendriksen Feb 3, 2023

benfred commented Feb 7, 2023

Remove Index template parameter from distance specializations #1220

Remove Index template parameter from distance specializations #1220

Conversation

benfred commented Feb 1, 2023 • edited Loading

codecov-commenter commented Feb 1, 2023

Codecov Report

benfred commented Feb 1, 2023

ahendriksen commented Feb 2, 2023

cjnolet Feb 3, 2023

Choose a reason for hiding this comment

cjnolet Feb 3, 2023

Choose a reason for hiding this comment

ahendriksen Feb 3, 2023

Choose a reason for hiding this comment

benfred commented Feb 7, 2023

benfred commented Feb 1, 2023 •

edited

Loading