-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace normalize_rows
in ann_utils.cuh
by a new rowNormalize
prim and improve performance for thin matrices (small n_cols
)
#979
Conversation
normalize_rows
in ann_utils.cuh
by a new raft rowNormalize
prim and improve performance for thin matrices (small n_cols
)normalize_rows
in ann_utils.cuh
by a new rowNormalize
prim and improve performance for thin matrices (small n_cols
)
Hi, @Nyrio, I'd like to have a look at this when I'm back from vacation on Monday. |
Is that a realistic use case for this primitive? Do we have examples where the feature space is very large and the number of rows small? I'm asking because we will need to write another kernel for thick matrices, and I don't think it's worth doing if we don't have any use for it yet. We can always do it when the need arises. |
Note on thick matrices: we need to all-reduce the norm between blocks collaborating on the same row, so it's probably worth first using |
Ideally, I think, it would be nice to have a public-facing generic row/col-normalize function somewhere in the distance namespace, which would also take the distance type as an argument. What do you think about this, @cjnolet? Maybe even with a python interface? I remember, we had some issues with normalization being done slow by cudf in cuml/svm. @tfeher, do you remember if we could potentially have extreme matrix dimensions there (m >> n and n >> m)? |
Could potentially be the case for linear methods when one wants to solve the dual problem (e.g. dual coordinate descent)?.. |
Totally agree, I'll see about making the non-coalesced kernel to support row/col normalize with row/col-major and might add it to this PR or a separate one. If we take the norm type as an argument, would you prefer to systematically apply the square root for L2, or provide an option? Not sure if there is any case where one would want to divide by the sum of squares and not the real L2 norm. Regarding thick matrices, I'm implementing that for the coalesced reduction in a separate branch, and if we think it's important for this PR too I'll also see about adding it as well. |
Hmm, although we have both squared and normal L2 enum values in the the |
@achirkin Please note that the enum NormType { L1Norm = 0, L2Norm }; And while For |
In general, we expect metric-spaces to start breaking down around dimension 1024 and above and this phenomenon has been published in a lot of literature. What happens is the variance of the data points begins to decrease rapidly and they all end up just converging into a single blob, lowering their over discriminative capabilities in that space. That's true for distances, anyways, but for normalizing a set of vectors, I've seen go up into the 10s of thousands. As @achirkin points out, this might be used as a preprocessing step before a larger feature selection step is performed using something like lasso, for example. I do suggest that we consider moderately tall and wide datasets (thousands not millions). |
I suggest that we keep the normalization in the linalg namespace because its use for distance computations is more of an implementation detail and not the primary goal of performing the normalization. I could see an argument for putting it in the matrix namespace but I think it's representation as a standard norm followed by a matrix/vector division across rows makes it more suitable for linalg. |
Oh, I missed that Artem suggested the distance namespace. If |
@cjnolet I see, it makes sense. On a separate branch, I have a 3-kernel approach for the coalesced reduction (thin: shuffle-based reduction with multiple rows per block, medium: current cub-based reduction with 1 block = 1 row, thick: multiple blocks per row and atomics) I am benchmarking and writing heuristics, and then I can adapt this for |
I propose we eventually remove the enum and just accept an integral "order" argument directly. This is what scipy does, for example. Another option would be to add all the most widely used norm computations (l0, l1, l2, linf) to the enum. L1 and l2 are widely used but as pointed out in the semirings paper I referenced earlier, l0 (essentially number of nonzero matrix elements), linf and lmax are also important when used to compute a multitude of general distance measures (not necessarily just metrics). They also boil down to simple reduction / accumulation functors in the end. |
@cjnolet @achirkin I have consolidated the interface with arbitrary norm types, following Corey's suggestion to have both a functor and an enum-based APIs. I have also improved the performance for thick matrices with a cub-based kernel. Unlike what I did in #1011, I think two code paths are enough for this one rather than three. My work on cf perf chart in the updated description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @louis for the PR! I have some questions, please see below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Louis for addressing the issues, the PR looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. And thank you for using the new mdspan API!
@achirkin just waiting on your input here before I merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for holding this off! LGTM as well, I'm very glad to see the prims like this being thoroughly optimized in raft!
Also just couple small comments below.
@@ -516,6 +516,16 @@ struct Nop { | |||
HDI Type operator()(Type in, IdxType i = 0) { return in; } | |||
}; | |||
|
|||
template <typename Type, typename IdxType = int> | |||
struct SqrtOp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and in other "Functor" structs: would you consider moving the template parameters onto the operator()
where possible? This way, the template parameter will be inferred automatically - less typing and opportunities to introduce bugs on the user side.
|
||
template <typename Type> | ||
struct Max { | ||
HDI Type operator()(Type a, Type b) { return myMax(a, b); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, I had an impression that we wanted to deprecate myXxx
-style functions in raft. Is that the case, @cjnolet ? Here, we could use std::max
, for it being constexpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nyrio do you want to create an issue for these and do them as a follow-on? I think since this PR has already been approved and run through CI (and since burndown starts tomorrow), we can go ahead and merge this as-is. What do you guys think?
I have opened two issues since both remarks are outside of the scope of this PR and shouldn't delay merging it. |
@gpucibot merge |
This follows up on a discussion at #652 (comment). The main goal of this PR is to make this helper accessible as a raft primitive.
I also used the opportunity to look at the performance of this primitive, and have improved it for:
Here is an overview of the before/after performance on A100: