-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Refactor spectral clustering and transfer the backend to RAFT #868
Comments
Can I get a bit more details about requirements, specifically:
|
There is a strong dependency of functions in |
Yes, your understanding is correct.
They are tightly connected to this code indeed. The dilemma is, from what I remember from these matrix classes, that they are not in good enough shape to be the standard sparse matrix classes for the rest of RAFT to build on top it. So it seems two paths can be taken:
|
How about |
Also, @cjnolet , could you please advise on the specific needs for a sparse matrix abstraction in cuML? Thanks in advance. |
It would be nice if we had an RAII abstraction that could use device buffers under the hood to manage the device memory for the underlying arrays. We have something very simple in cuML that has cleaned up our code tremendously by replacing the need to manage and pass 3 arrays everywhere. Here's an example of some of the constructors we use- the port from cuml to raft should be straightforward but I'm also very much open to an abstraction that serves cugraph's needs as well. https://github.com/rapidsai/cuml/blob/branch-0.15/cpp/src_prims/sparse/csr.cuh#L117 |
Looks like So, we have:
Then, what could be done in RAFT to satisfy requirements above in both Can we do that? |
@cjnolet This is provided in the GraphCSR/COO classes here The GraphCSR classes own device buffers internally and can be used at the python level by releasing the ownership of the buffer to the python object. In cudf terms this is the column. The GraphCSRView class can be created from a GraphCSR class that just takes the pointers of the device buffers to give a non owning abstraction for clean code. In cudf terms this is the column_view. Whenever we allocate a graph on the C++ side and want to pass it to the python side we return a unique_ptr to GraphCSR. |
Regrading the RAII class to store sparse matrices, I'm inclined to creating a bare minimum class to store CSR/CSC like data structures. cuGraph can build a Graph class that provides more abstraction/graph specific features on top of this bare minimum class. We may later move this higher level graph class to raft if it turns out to be valuable, but I think we are too early to make that decision yet.
Then a higher level cuGraph graph class can have this class object as a member variable.
|
And the device_uvector PR gets merged (rapidsai/rmm#364). We can replace rmm::device_buffer with rmm::device_uvector if we know type in compile time. |
Why not just use Actually, on 2nd thought @seunghwak 's suggestion of a bare minimum class in RAFT, which can be built on on both But that's probably beyond the scope of this project (spectral clustering exposure in RAFT and cleanup in cugraph of some nvgraph artifacts) |
and if we want to make our data structure ready for OPG, we may add two additional member variables cuGraph's graph class may not be complete/stable enough to go to raft yet. Once we move this to raft and cuML gets dependent on our graph class, future updates become much more difficult. And we may add very graph specific features to this class and this may make the class unnecessarily bulky for cuML use. |
Closing. |
Motivation
Spectral clustering is used by cuML and should be in RAFT to avoid a circular dependency. A lot of the backend building blocks can be used independently ( like lanczos solver and kmeans).
However, this is a legacy code from nvgraph that can't be transferred as is.
The following items should be taken care of :
matrix
class.The text was updated successfully, but these errors were encountered: