Gram matrix support for sparse input #1296

mfoerste4 · 2023-02-22T20:50:34Z

This PR adds sparse input support (CSR) for GramMatrix kernel computation. This is a requirement to enable SVM support for sparse input in cuML issue 2197.

It also adds row norm computation for CSR which is utilized for expanded L2 norm computation within RBF kernels.

Although this branch introduces a new API it is still backwards compatible with the old GramMatrix API (which is marked as deprecated).

CC @cjnolet @tfeher

rapids-bot · 2023-02-22T20:50:38Z

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

tfeher

Hi Malte, thanks for the PR! I have looked at the changes of the Gram matrices. Overall it looks nice, below you will find my comments.

I have added the "braking" label to remind us that we have code in cuML that is affected by the interface changes (even though it is the detail namespace, so technically it should not be breaking).

I will share my comments about the new norm method separately.

cpp/include/raft/distance/detail/kernels/kernel_matrices.cuh

cpp/include/raft/distance/detail/kernels/gram_matrix.cuh

cpp/include/raft/distance/detail/kernels/kernel_matrices.cuh

cpp/test/sparse/gram.cu

tfeher · 2023-03-08T17:09:44Z

cpp/include/raft/distance/detail/matrix/matrix.hpp

+template <typename math_t>
+class DenseMatrix : public Matrix<math_t> {
+ public:
+  DenseMatrix(math_t* data, int rows, int cols, bool row_major = false, int ld_in = 0)


Ideally we would like to use mdspan for representing the dense matrix. Could you list the pain points that would motivate keeping a pointer based API still around?

I thought about storing the dense data ptr as raft::device_aligned_matrix_view<math_t, std::uint32_t, layout_left_padded<math_t>>, which would combine the the information of data/n_rows/n_cols/is_row_major/lld.
This would require getters for all members as they need to be extracted from layout which is fine.

The reasons I did not choose to do it though are

Python interface in cuML for what this is wrapper was originally build for starts with raw pointer(s) from (Sparse)CumlArray, which is why a simple container containing these pointers was an easy and sufficient approach. Using mdspan would add additional code here in order to construct the layouts.

only raw data pointers need to be extracted and forwarded to third party libs (like cublas/cusparse), no element access needed that could benefit from mdspan/layout

matrix container for sparse would still have raw data pointers and individual members for rows/cols/nnz which feels inconsistent

I am fine keeping this version until the sparse wrapper PR New Sparse Matrix APIs #1279 is merged, and consolidate the interface afterwards.

Note that we have helper functions to deal with mdspan on the Python side, see usage example here

raft/python/pylibraft/pylibraft/neighbors/ivf_pq/ivf_pq.pyx

Line 523 in 03e26b5

get_dmv_float(vecs_cai, check_shape=True),

I agree with @tfeher, rather than introducing more abstractions for this, we can unify the dense and sparse APIs using overloads of the existing abstractions (with templates if needed).

We didn’t have abstractions like the mdspan available when we were building cuml so we used pointers for everything. Now that we have the proper abstractions and infrastructure in place, we should use them.

please note that the sparse API has been merged. We should be able to add overloads for both dense and sparse without having an explicit class to unify them.

@cjnolet, @tfeher , I have updated the PR to utilize both device_mdspan and device_csr_matrix_view as input for GramMatrixBase. I have not yet adapted the cuML branch as I would like to hear your feedback first.

I had to explicitly define combinations of inputs for the evaluate method as it is virtual and needs to be overridden by the deriving kernels. I did not find an elegant way to describe this via templates which results in some duplicated code. If you have a better suggestion on how to design this avoiding code duplication without introducing other pain I'd be happy to hear about it.

Note that lots of the code-lines is marked as deprecated and only exist to be backwards compatible to cuML. We can remove it as soon as the cuML PR is merged as well.

My apologies @tfeher and @mfoerste4, I meant to revisit this discussion myself and things have been pretty crazy the past couple of weeks. Please understand that my scrutiny over adding new types is only because

I want to make sure that we're not exploding the number of types that a user needs work with, and

making sure that we aren't creating new types each time we encounter a new pattern, unless said types can be at least moderately reusable across the codebase.

It sounds like the goal here is to have a trivial unified base class just so we can accept a single type and dispatch to the specific types. So far, the cython APIs in RAFT have gotten pretty tedious to code- and that's just for host/device mdspan. Adding sparse to the mid and accepting that on host and device is going to make things even more tedious. I definiely understand the reasoning here.

Currently, the SimpleMat contains linear algebra implementations. If the goal here is just to have a unified base class that can accept read-only inputs, could we maybe just create a very simple class called something like read_only_matrix_view which we extend for csr_read_only_matrix_view and dense_read_only_matrix_view?

No need to apologize. As far as I am concerned we can keep the GramMatrix raft interface as is with 3 different APIs for Dense, Sparse & Mixed input. This would keep the Raft API clean and we would not need another wrapper definition - at least not in raft.

As for cuML, in addition to the read-only views we also need a data/structure-owning matrix wrapper that allows for resizing internal data structures (ResizableCsrMatrix). As this is very specific for this use-case,
I would propose to re-use the old simple unified wrapper (with resizable addon) within cuML. It has simple constructors to be created from (Sparse)CumlArray within cython, and can be modified to internally contain/provide either mdspan or csr_matrix_views in order to be used when calling into raft.

As for cuML, in addition to the read-only views we also need a data/structure-owning matrix wrapper that allows for resizing internal data structures (ResizableCsrMatrix). As this is very specific for this use-case,
I would propose to re-use the old simple unified wrapper (with resizable addon) within cuML.

It sounds like what you need there is indeed a structure-owning CSR matrix. Why not use the one that's already in RAFT? That's actually not a very specific use-case at all and that structure/pattern is used heavily through the sparse APIs. Again, it looks like we're duplicating types, which means increased maintenance burden.

Eventually, all of RAFT"s sparse APIs (and cuml) will be using those sparse types but these are baby steps. The purpose of those structure-owning types is to allow you to own the underlying data arrays and initialize the sparsity (resize) once you know it.

Why not use the one that's already in RAFT?

You are right, it seems that the data-owning csr_matrix would be a perfect fit once we switch to mdspan/csr_matrix_(view) representation within the unified wrapper.

As far as I am concerned we can keep the GramMatrix raft interface as is with 3 different APIs for Dense, Sparse & Mixed input.

@cjnolet, are you ok with the current API of the GramMatrix? Further discussion on the cuML APIs can take place in the cuML PR.

…e_kernels

mfoerste4 · 2023-03-13T10:02:49Z

Hi Malte, thanks for the PR! I have looked at the changes of the Gram matrices. Overall it looks nice, below you will find my comments.

I have added the "braking" label to remind us that we have code in cuML that is affected by the interface changes (even though it is the detail namespace, so technically it should not be breaking).

I will share my comments about the new norm method separately.

Thanks @tfeher for reviewing. I have applied your suggestions pushed an update.

tfeher

Thanks Malte for updating the PR! Here are my second batch of comments (related to csr norm).

We need to find a way not to break cuML with this PR. Can we just keep the old implementation as on overload, and remove it once the corresponding cuML PR is accepted?

cpp/include/raft/sparse/linalg/norm.cuh

cpp/test/sparse/normalize.cu

cpp/include/raft/sparse/linalg/norm.cuh

mfoerste4 · 2023-03-14T11:01:16Z

Thanks Malte for updating the PR! Here are my second batch of comments (related to csr norm).

We need to find a way not to break cuML with this PR. Can we just keep the old implementation as on overload, and remove it once the corresponding cuML PR is accepted?

I could add the old interface, but it would be a bit more intrusive as

we need the operator() AND evaluate interface (the latter is called directly by prediction)
we need a second set of constructors to allow instantiation with cublas_handle instead of raft handle
Should I proceed anyways?

…compatibility until cuml is updated

mfoerste4 · 2023-03-14T17:58:45Z

Thanks Malte for updating the PR! Here are my second batch of comments (related to csr norm).
We need to find a way not to break cuML with this PR. Can we just keep the old implementation as on overload, and remove it once the corresponding cuML PR is accepted?

I could add the old interface, but it would be a bit more intrusive as
* we need the operator() AND evaluate interface (the latter is called directly by prediction)

* we need a second set of constructors to allow instantiation with cublas_handle instead of raft handle
  Should I proceed anyways?

@tfeher , I added the old interface for backwards compatibility and marked it as deprecated. in addition to that I followed your suggestion to modify the new API to pass the handle as runtime to the operator/evaluate function instead.

cjnolet · 2023-04-12T17:51:53Z

@mfoerste4 just a note- I bumped this to 23.06. I'll re-review so we can get this in quickly.

cjnolet

Thanks for the updates, @mfoerste4. The new norm and sparse::spmm APIs look great. I think this is almost there.

cpp/include/raft/distance/kernels/kernel_matrices.cuh

cjnolet · 2023-04-17T17:58:57Z

cpp/include/raft/sparse/linalg/spmm.cuh

+          typename NZType,
+          typename LayoutPolicyY,
+          typename LayoutPolicyZ>
+void spmm(raft::device_resources const& handle,


Tagging @divyegala for awareness thoughts since he's starting on exposing the sparse APIs w/ our new core vocabulary.

cjnolet

LGTM. Thanks Malte! Before we merge this, we should probably open a blank PR in cuml that pins the RAFT fork/tag in cpp/cmake/thirdparty/get_raft.cmake to your branch just to make sure we didn't overlook anything that might break cuml when we merge this.

mfoerste4 · 2023-04-20T13:26:25Z

LGTM. Thanks Malte! Before we merge this, we should probably open a blank PR in cuml that pins the RAFT fork/tag in cpp/cmake/thirdparty/get_raft.cmake to your branch just to make sure we didn't overlook anything that might break cuml when we merge this.

Thanks @cjnolet. I have created a test PR here.
UPDATE: cuml tests are green despite test_simpl_set.py which is unrelated

cjnolet · 2023-04-24T20:39:58Z

@mfoerste4 a few recent updates to the sparse API broke your changes here. I can make these updates and push to your banch if it's easier (since I know what changed).

mfoerste4 · 2023-04-24T20:45:05Z

@mfoerste4 a few recent updates to the sparse API broke your changes here. I can make these updates and push to your banch if it's easier (since I know what changed).

@cjnolet , that would be great, thanks. I have been trying to keep it green but did not confirm after the last merge.

cjnolet · 2023-04-24T22:34:16Z

/merge

@cjnolet

This PR adds sparse input support (CSR) for GramMatrix kernel computation. This is a requirement to enable SVM support for sparse input in [cuML issue 2197](rapidsai/cuml#2197). It also adds row norm computation for CSR which is utilized for expanded L2 norm computation within RBF kernels. Although this branch introduces a new API it is still backwards compatible with the old GramMatrix API (which is marked as deprecated). CC @cjnolet @tfeher Authors: - Malte Förster (https://github.com/mfoerste4) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#1296

This PR adds support for sparse input to SVR and SVC. 'fit' as well as 'predict' can be called with sparse data compatible/convertible to SparseCumlArray. Support vectors in the model might also be stored as sparse data and can be retrieved as such. This PR requires rapidsai/raft#1296 to provide sparse kernel computations. Corresponding issue: #2197 Authors: - Malte Förster (https://github.com/mfoerste4) - Tamas Bela Feher (https://github.com/tfeher) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) URL: #5273

mfoerste4 added 7 commits December 8, 2022 20:41

gram matrix support for csr

959bb29

Add CSRxDense kernel compute, also add row norm for CSR

36c56b1

fix RBF for dense with offset

a99c129

add matrix wrapper to unify kernel API

60017db

merge with 23.04

d0a0de3

finalize merge, adjust/add tests

9f46742

add test and fix rbf

c096495

mfoerste4 requested review from a team as code owners February 22, 2023 20:50

github-actions bot added CMake cpp labels Feb 22, 2023

cjnolet assigned mfoerste4 Feb 28, 2023

Merge branch 'branch-23.04' into sparse_kernels

c8f3a2d

tfeher added 5 - DO NOT MERGE Hold off on merging; see PR for details feature request New feature or request non-breaking Non-breaking change breaking Breaking change and removed non-breaking Non-breaking change labels Mar 8, 2023

tfeher requested changes Mar 8, 2023

View reviewed changes

cjnolet and others added 4 commits March 11, 2023 15:58

Merge branch 'branch-23.04' into sparse_kernels

f87e514

review suggestions

8174693

Merge branch 'branch-23.04' into sparse_kernels

66e5534

Merge branch 'sparse_kernels' of github.com:mfoerste4/raft into spars…

3ab7226

…e_kernels

tfeher requested changes Mar 13, 2023

View reviewed changes

cpp/include/raft/sparse/linalg/norm.cuh Outdated Show resolved Hide resolved

cpp/test/sparse/normalize.cu Show resolved Hide resolved

cpp/include/raft/sparse/linalg/norm.cuh Outdated Show resolved Hide resolved

cpp/include/raft/sparse/linalg/norm.cuh Outdated Show resolved Hide resolved

review comments norm

5bbcd00

removed handle member, but re-introduced old API to ensure backwards …

86a0314

…compatibility until cuml is updated

mfoerste4 added 4 commits March 30, 2023 18:39

utilize public API for spmm, gemm

2403b2d

refactored rowNormCsr to utilize csr_row_op

f57be13

changed order of arguments according to best practice

3f61b64

moved kernel computation to public section

2b6090a

mfoerste4 requested a review from cjnolet March 30, 2023 22:23

mfoerste4 and others added 5 commits April 5, 2023 10:57

Merge branch 'branch-23.04' into sparse_kernels

c34f242

removed outdated docstring

563032c

fix row-major algorithm selection for cusparse spmm

23e308d

fixed doc build

a5ee783

Merge branch 'branch-23.04' into sparse_kernels

d7001ff

cjnolet changed the base branch from branch-23.04 to branch-23.06 April 12, 2023 17:51

Merge branch 'branch-23.06' into sparse_kernels

d7be021

cjnolet requested changes Apr 17, 2023

View reviewed changes

tfeher mentioned this pull request Apr 17, 2023

Remove specializations and split expensive headers #1415

Closed

mfoerste4 and others added 2 commits April 18, 2023 14:56

reverted changeset 2b6090a

d7d2f5b

Merge branch 'branch-23.06' into sparse_kernels

8d851b3

cjnolet approved these changes Apr 19, 2023

View reviewed changes

mfoerste4 and others added 2 commits April 20, 2023 10:17

Merge branch 'rapidsai:branch-23.06' into sparse_kernels

089612c

merge API conflicts with recent updates to sparse structures

f2ebd76

ahendriksen mentioned this pull request Apr 21, 2023

[ENH] [1/5] Header structure: replace specializations #1437

Closed

Merge branch 'branch-23.06' into sparse_kernels

b22d7fe

Fixing build

ae8fbb5

rapids-bot bot merged commit 1defccc into rapidsai:branch-23.06 Apr 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gram matrix support for sparse input #1296

Gram matrix support for sparse input #1296

mfoerste4 commented Feb 22, 2023 •

edited

Loading

rapids-bot bot commented Feb 22, 2023

tfeher left a comment

tfeher Mar 8, 2023

mfoerste4 Mar 13, 2023 •

edited

Loading

tfeher Mar 15, 2023

cjnolet Mar 16, 2023

mfoerste4 Mar 20, 2023

cjnolet Mar 22, 2023 •

edited

Loading

mfoerste4 Mar 22, 2023 •

edited

Loading

cjnolet Mar 22, 2023 •

edited

Loading

mfoerste4 Mar 22, 2023 •

edited

Loading

mfoerste4 Mar 24, 2023 •

edited

Loading

mfoerste4 commented Mar 13, 2023

tfeher left a comment

mfoerste4 commented Mar 14, 2023

mfoerste4 commented Mar 14, 2023

cjnolet commented Apr 12, 2023

cjnolet left a comment

cjnolet Apr 17, 2023

cjnolet left a comment •

edited

Loading

mfoerste4 commented Apr 20, 2023 •

edited

Loading

cjnolet commented Apr 24, 2023

mfoerste4 commented Apr 24, 2023

cjnolet commented Apr 24, 2023

Gram matrix support for sparse input #1296

Gram matrix support for sparse input #1296

Conversation

mfoerste4 commented Feb 22, 2023 • edited Loading

rapids-bot bot commented Feb 22, 2023

tfeher left a comment

Choose a reason for hiding this comment

tfeher Mar 8, 2023

Choose a reason for hiding this comment

mfoerste4 Mar 13, 2023 • edited Loading

Choose a reason for hiding this comment

tfeher Mar 15, 2023

Choose a reason for hiding this comment

cjnolet Mar 16, 2023

Choose a reason for hiding this comment

mfoerste4 Mar 20, 2023

Choose a reason for hiding this comment

cjnolet Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

mfoerste4 Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

cjnolet Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

mfoerste4 Mar 22, 2023 • edited Loading

Choose a reason for hiding this comment

mfoerste4 Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

mfoerste4 commented Mar 13, 2023

tfeher left a comment

Choose a reason for hiding this comment

mfoerste4 commented Mar 14, 2023

mfoerste4 commented Mar 14, 2023

cjnolet commented Apr 12, 2023

cjnolet left a comment

Choose a reason for hiding this comment

cjnolet Apr 17, 2023

Choose a reason for hiding this comment

cjnolet left a comment • edited Loading

Choose a reason for hiding this comment

mfoerste4 commented Apr 20, 2023 • edited Loading

cjnolet commented Apr 24, 2023

mfoerste4 commented Apr 24, 2023

cjnolet commented Apr 24, 2023

mfoerste4 commented Feb 22, 2023 •

edited

Loading

mfoerste4 Mar 13, 2023 •

edited

Loading

cjnolet Mar 22, 2023 •

edited

Loading

mfoerste4 Mar 22, 2023 •

edited

Loading

cjnolet Mar 22, 2023 •

edited

Loading

mfoerste4 Mar 22, 2023 •

edited

Loading

mfoerste4 Mar 24, 2023 •

edited

Loading

cjnolet left a comment •

edited

Loading

mfoerste4 commented Apr 20, 2023 •

edited

Loading