-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement induced subgraph extraction (SG C++) #1354
Implement induced subgraph extraction (SG C++) #1354
Conversation
[REVIEW] enable multigraph
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to see some perf comparison against the COO one and profile.
I am particularly curious about how parts like the big thrust::for_each and its nested thrust calls play out.
subgraph_edge_offsets.begin()); | ||
|
||
CUDA_TRY( | ||
cudaStreamSynchronize(handle.get_stream())); // subgraph_vertex_output_offsets will become |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we doing cudaStreamSynchronize here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought subgraph_vertex_output_offsets (
https://github.com/rapidsai/cugraph/pull/1354/files/038d2d782363c87ede499d0e8d441e4d091a3dac#diff-dfa26d738fa79bcd814644900db1536e42986eaebed75714da3dad8f993381e4R128)
will become out-of-scope once this function returns; then the memory can be reclaimed and be used elsewhere; but operations using subgraph_vertex_output_offsets will not finish till the stream synchronizes; if this happens, it can lead to undefined behaviors.
Need to double check, but this may not be true.
deallocate() is submitted using a stream (
https://github.com/rapidsai/rmm/blob/branch-0.18/include/rmm/device_buffer.hpp#L422); so actual memory reclamation may not happen till all the previous operations on the stream finishes. I will double check this with RMM folks and make a fix if this adds unnecessary stream synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is not necessary, I will remove this (and I did something like this in many places... so I need to remove all those).
|
||
// explicit instantiation | ||
|
||
template std::tuple<rmm::device_uvector<int32_t>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add 64b edges and double weights instantiations? I already accounted for these in egonet.
It should be just about instantiating them right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so what type combinations do you need?; we need to include all the types we actually use (for the obvious reason) but better avoid instantiating for any types we don't use (as this increases compile-time and binary size).
for (vertex_t, edge_t, weight_t) triplets, (int32_t, int32_t, float), (int32_t, int64_t, float), (int64_t, int64_t, float) must be instantiated I guess, and are we actually using double weights?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have instantiated these in Egonet: https://github.com/afender/cugraph/blob/7844fa4c35500fb85d9f38a9b2f74d640684fc9b/cpp/src/community/egonet.cu#L128
For the double weights, it is a good discussion. I don't think it is motivated by this algo but since the graph and other algos accept it, we should probably instantiate it here as well.
@@ -243,5 +243,49 @@ void relabel(raft::handle_t const& handle, | |||
vertex_t num_labels, | |||
bool do_expensive_check = false); | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you pick what goes in algorithms.hpp and what goes in graph_functions.hpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the current guideline (for myself) is to place graph analytics (e.g. PageRank, BFS, ...) in the existing algorithms.hpp while providing operations on graphs (but that does not modify the graph object, ones modifying the graph object needs to be a member function) in the graph_functions.hpp.
Do you have better suggestions for header file namings and where to put which?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how there's a thin and somewhat subjective boundary between graph analytics algos and operations on a graph that do not modify it. We should identify if there's a strong benefit to C++ API users (since it is an exposed header) and go from there.
|
||
matrix_partition_device_t<graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu>> | ||
matrix_partition(graph_view, 0); | ||
thrust::transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two large thrust transforms calls in this file could come with some more explanation to facilitate future maintenance.
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #1354 +/- ##
===============================================
+ Coverage 60.38% 60.71% +0.33%
===============================================
Files 67 67
Lines 3029 3060 +31
===============================================
+ Hits 1829 1858 +29
- Misses 1200 1202 +2
Continue to review full report at Codecov.
|
I haven't rigorously compared the performance with the approach scanning the entire edge list, but extracting three unweighted subgraphs of size (300, 20, 400) vertices from lournal-2008.mtx took 1.7 ms and three weighted subgraphs of size (9130, 1200, 300) vertices from ljournal-2008.mtx took 4.5 ms, so this is running faster with smaller sub-graphs (I assume it will take longer if we scan the entire set of edges). The biggest performance issue with the current implementation is dealing with power-law graphs with wide variations in vertex degrees, but this is a recurring issue in many implementations in the experimental space, and I plan to address these all at once in a separate PR. And let me know if this becomes a performance bottleneck in your egonet testing. |
@afender I think I addressed all your comments, but let me know if you have any remaining concerns. |
20d2a5b
to
4f35bcb
Compare
Close #1323