Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define k-core API and tests #2712

Merged
merged 7 commits into from
Sep 26, 2022

Conversation

ChuckHastings
Copy link
Collaborator

Define k-core C++ API and tests.

Closes #2631
Closes #2632
Closes #2633
Closes #2635

@ChuckHastings ChuckHastings requested review from a team as code owners September 21, 2022 17:23
@ChuckHastings ChuckHastings self-assigned this Sep 21, 2022
@ChuckHastings ChuckHastings added this to the 22.10 milestone Sep 21, 2022
@ChuckHastings ChuckHastings added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 22, 2022

Codecov Report

Base: 60.04% // Head: 60.04% // No change to project coverage 👍

Coverage data is based on head (59bce1d) compared to base (3eb2b40).
Patch has no changes to coverable lines.

❗ Current head 59bce1d differs from pull request most recent head 3f7db87. Consider uploading reports for the commit 3f7db87 to get more accurate results

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-22.10    #2712   +/-   ##
=============================================
  Coverage         60.04%   60.04%           
=============================================
  Files               111      111           
  Lines              6184     6184           
=============================================
  Hits               3713     3713           
  Misses             2471     2471           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

*
* @return edge list for the graph
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a better idea to pass bool transpose=false as template parameter and then use transpose as a named parameter to the function instead of implicit false?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid excess compilation of different templated functions, as this increased compile time.

Most of our functions are implemented based on an assumption about whether the graph is stored transposed or not. For example, pagerank assumes that store_transposed=true. Implementing pagerank if store_transposed=false would result in a great deal more synchronization (and communication in a multi-gpu environment). There are a few examples that will work on a graph in either orientation.

Supporting this would require providing an implementation that would work with either orientation of the graph. It seems like as long as this matches the requirement for core_number (which has to be called first) then just one orientation is sufficient.

Comment on lines +24 to +25
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as mentioned in my previous comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above reaction.

* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param graph cuGraph graph in coordinate format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this comment is outdated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>>
k_core(raft::handle_t const& handle,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to think about...

So, we're passing core numbers here. So in this case, should we really have both this function and induced subgraph?

seeing core numbers, we can easily extract a vertex list in a single thrust call (e.g. thrust::copy_if). Then, we can pass a vertex list to induced subgraph.

Not sure this function adds enough convenience to justify increase in compile time/binary size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe OK, if this calls explicitly instantiated induced_subgraph, increases in compile time/binary size might be minimal...

Still debating between whether this function should take core_numbers or just call core_numbers internally.... (to maximize user convenience, if not, a user can just separately call core_numbers, something like thurst::copy_if, and induced_subgraph).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that debate internally. Ultimately I concluded that if we're going to expose a complete k-core implementation at the python level we should probably expose it at the lower levels as well. It seems like something that would be useful.

I do wonder if we should make passing in core_numbers optional, and if they are not passed then also call core_numbers. This would allow a simple caller that just wants to extract a 3-core from the graph to be able to call the function directly and get the complete answer, while a more sophisticated case might call core_number once and reuse the result to extract the different subgraphs as required.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... +1 for taking core_numbers as std::optional.

Still a bit not-sure about "Ultimately I concluded that if we're going to expose a complete k-core implementation at the python level we should probably expose it at the lower levels as well."

At the python layer, user convenience might be more important than anything else. And providing an additional python function that internally calls core_number, find vertex list, and induced subgraph has no toll on compile time/binary size as python is an interpreted language.

Not sure we should provide the same level of convenience for C++ users, as we assume C++ users are more advanced and they might be OK about finding k-core by composing core_numbers and induced subgraph.

But this is a much bigger topic, and we may have this discussion again in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And another thing to consider is that this may incur some sort of dependency within a same algorithm level in our C++ software hierarchy. Most algorithms are composed of primitives and thrust calls. But now algorithms are composed of primitives, thrust calls, and other algorithms. I am not sure what kind of complications this can incur in the future.

Composing C++ algorithms in the C layer might be OK, but I am not sure whether we should support use cases that can be supported simply by composing few existing algorithms in the C++ level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point.

I will change core_numbers to be optional.

We can explore dropping the k_core from the C++ layer if having the algorithms depend on other algorithms becomes an issue. I see the point that adding it at the C layer may be sufficient.

… user deletes the original but needs to recreate
@jnke2016
Copy link
Contributor

jnke2016 commented Sep 24, 2022

I don't see where these k-core functions are implemented: cugraph_core_result_get_src_vertices, cugraph_core_result_get_dst_vertices, cugraph_core_result_get_weights. Please can you point me to those? I am getting the error below in pylibcugraph

cugraph_core_result_get_src_vertices' is not a constant, variable or function identifier

@ChuckHastings
Copy link
Collaborator Author

Good catch, Joseph. Adding the missing functions.

@ChuckHastings ChuckHastings added the DO NOT MERGE Hold off on merging; see PR for details label Sep 24, 2022
@ChuckHastings
Copy link
Collaborator Author

I see another issue, need to fix something else also.

@jnke2016
Copy link
Contributor

Ok thanks. It looks like cugraph_k_core too is missing

@ChuckHastings
Copy link
Collaborator Author

rerun tests

@ChuckHastings ChuckHastings removed the DO NOT MERGE Hold off on merging; see PR for details label Sep 26, 2022
@ChuckHastings
Copy link
Collaborator Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit f0c1e99 into rapidsai:branch-22.10 Sep 26, 2022
@ChuckHastings ChuckHastings deleted the fea_k_core_api branch December 2, 2022 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
6 participants