-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Index template parameter from distance specializations #1220
Conversation
Remove the index template parameter from the pairwise distance specializations. Since this is only used for scalar values with the dimensionality of the input/output matrices - having multiple different types here doesn't benefit us much, and just causes us to compile more distance specializations (or miss them in the case of int64_t indices).
Codecov ReportBase: 87.99% // Head: 87.99% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## branch-23.02 #1220 +/- ##
=============================================
Coverage 87.99% 87.99%
=============================================
Files 21 21
Lines 483 483
=============================================
Hits 425 425
Misses 58 58 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
This PR seems to significantly improve the build times on my dev box. Running a clean build on my dev box with
Compared to the
(Note that in branch-23.02 - I'm seeing the |
Does the |
@@ -252,65 +252,64 @@ void pairwise_distance(raft::device_resources const& handle, | |||
{ | |||
switch (metric) { | |||
case raft::distance::DistanceType::L2Expanded: | |||
detail::pairwise_distance_impl<Type, Index_, raft::distance::DistanceType::L2Expanded>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do really like the idea of removing this remplate argument altogether because it completely consolidates the number of possible specializations. I wonder, though, is there a way we can be smart about the index precision without necessarily having to force everything to be 64-bit? I can imagine there's only a select few places that would actually require 64-bit indexing to be used, right? Such as when we need to multiply m
*n
in order to get the index into the output tiles? Otherwise, I can't imagine either m
or n
being into the billions. 10's of millions maybe (which is why their product could overflow in 32-bit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ahendriksen as the three of us were discussing this yesterday
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently when instantiated with IdxT=int
, the pairwise_distance kernels are intended to be correct when
n, m, k
are less than 2 billion (i.e., representable by positive signed int).- The input arrays have less than 2 billion elements. The indexing here is prone to overflow.
- The output array can be any int64_t indexable size, as writing to the output array is done with care.
There are a couple of options to consolidate the index types:
- Standardize on
typeof(m, n, k) = int
and use careful indexing on the input arrays, so that we can support input arrays with more than 2B elements. Throw a runtime error whenn, m, k
> 2B. - Standardize on
typeof(m, n, k) = int
and use careful indexing on the input arrays, so that we can support input arrays with more than 2B elements. In addition, supportint64
strides. At runtime, when one ofn, m, or k
> 2B, we can batch the calls to the pairwise distance kernel with appropriate subsets of the input/output arrays (changing base pointer, m, n, k and strides). - Standardize on
typeof(m, n, k) = int64
.
I expect that option 1 will have a negligible impact to performance. Option 2 might have a slightly bigger impact and option 3 will have up to 15% performance degradation.
My preference would be option 1, but I am not sure if that is feasible. On an 80GB card, we can store 20B floats. So values of m, n, k
for which we throw a run time error could be for instance:
k=1K, m=3M, n=1K
: leading tox= m x k = 3B, y= n x k = 1M, out= m x n = 3B
elements.
I'm closing this PR - and instead just using the int32 specializations in the bfknn replacement e870eb3 . |
Remove the index template parameter from the pairwise distance specializations. Since this is only used for scalar values with the dimensionality of the input/output matrices - having multiple different types here doesn't benefit us much, and just causes us to compile more distance specializations (or miss them in the case of int64_t indices).
This just uses a
size_t
for the m/n/k parameters for the distance metrics. Sinceint
anduint32_t
types can be implicitly converted to size_t - and we are only updating the detail namespace and not the public api -this isn't a breaking change