-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define k-core API and tests #2712
Define k-core API and tests #2712
Conversation
Codecov ReportBase: 60.04% // Head: 60.04% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## branch-22.10 #2712 +/- ##
=============================================
Coverage 60.04% 60.04%
=============================================
Files 111 111
Lines 6184 6184
=============================================
Hits 3713 3713
Misses 2471 2471 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
* | ||
* @return edge list for the graph | ||
*/ | ||
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be a better idea to pass bool transpose=false
as template parameter and then use transpose as a named parameter to the function instead of implicit false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid excess compilation of different templated functions, as this increased compile time.
Most of our functions are implemented based on an assumption about whether the graph is stored transposed or not. For example, pagerank
assumes that store_transposed=true
. Implementing pagerank
if store_transposed=false
would result in a great deal more synchronization (and communication in a multi-gpu environment). There are a few examples that will work on a graph in either orientation.
Supporting this would require providing an implementation that would work with either orientation of the graph. It seems like as long as this matches the requirement for core_number
(which has to be called first) then just one orientation is sufficient.
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu> | ||
std::tuple<rmm::device_uvector<vertex_t>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as mentioned in my previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above reaction.
cpp/include/cugraph/algorithms.hpp
Outdated
* @tparam edge_t Type of edge identifiers. Needs to be an integral type. | ||
* @tparam weight_t Type of edge weights. Needs to be a floating point type. | ||
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false) | ||
* @param graph cuGraph graph in coordinate format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this comment is outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
std::tuple<rmm::device_uvector<vertex_t>, | ||
rmm::device_uvector<vertex_t>, | ||
std::optional<rmm::device_uvector<weight_t>>> | ||
k_core(raft::handle_t const& handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to think about...
So, we're passing core numbers here. So in this case, should we really have both this function and induced subgraph?
seeing core numbers, we can easily extract a vertex list in a single thrust call (e.g. thrust::copy_if). Then, we can pass a vertex list to induced subgraph.
Not sure this function adds enough convenience to justify increase in compile time/binary size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe OK, if this calls explicitly instantiated induced_subgraph, increases in compile time/binary size might be minimal...
Still debating between whether this function should take core_numbers or just call core_numbers internally.... (to maximize user convenience, if not, a user can just separately call core_numbers, something like thurst::copy_if, and induced_subgraph).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that debate internally. Ultimately I concluded that if we're going to expose a complete k-core implementation at the python level we should probably expose it at the lower levels as well. It seems like something that would be useful.
I do wonder if we should make passing in core_numbers
optional, and if they are not passed then also call core_numbers
. This would allow a simple caller that just wants to extract a 3-core from the graph to be able to call the function directly and get the complete answer, while a more sophisticated case might call core_number
once and reuse the result to extract the different subgraphs as required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... +1 for taking core_numbers
as std::optional.
Still a bit not-sure about "Ultimately I concluded that if we're going to expose a complete k-core implementation at the python level we should probably expose it at the lower levels as well."
At the python layer, user convenience might be more important than anything else. And providing an additional python function that internally calls core_number, find vertex list, and induced subgraph has no toll on compile time/binary size as python is an interpreted language.
Not sure we should provide the same level of convenience for C++ users, as we assume C++ users are more advanced and they might be OK about finding k-core by composing core_numbers and induced subgraph.
But this is a much bigger topic, and we may have this discussion again in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And another thing to consider is that this may incur some sort of dependency within a same algorithm level in our C++ software hierarchy. Most algorithms are composed of primitives and thrust calls. But now algorithms are composed of primitives, thrust calls, and other algorithms. I am not sure what kind of complications this can incur in the future.
Composing C++ algorithms in the C layer might be OK, but I am not sure whether we should support use cases that can be supported simply by composing few existing algorithms in the C++ level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point.
I will change core_numbers
to be optional.
We can explore dropping the k_core
from the C++ layer if having the algorithms depend on other algorithms becomes an issue. I see the point that adding it at the C layer may be sufficient.
… user deletes the original but needs to recreate
I don't see where these
|
Good catch, Joseph. Adding the missing functions. |
I see another issue, need to fix something else also. |
Ok thanks. It looks like |
rerun tests |
@gpucibot merge |
Define k-core C++ API and tests.
Closes #2631
Closes #2632
Closes #2633
Closes #2635