Implement induced subgraph extraction (SG C++) #1354

seunghwak · 2021-01-22T20:52:31Z

Add extract_induced_subgraphs()
Add C++ tests for extract_induced_subgraphs()

…duced_subgraph

[REVIEW] enable multigraph

[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]

…duced_subgraph

afender

Would be great to see some perf comparison against the COO one and profile.
I am particularly curious about how parts like the big thrust::for_each and its nested thrust calls play out.

afender · 2021-01-22T21:39:47Z

cpp/src/experimental/induced_subgraph.cu

+                   subgraph_edge_offsets.begin());
+
+    CUDA_TRY(
+      cudaStreamSynchronize(handle.get_stream()));  // subgraph_vertex_output_offsets will become


Why are we doing cudaStreamSynchronize here?

I initially thought subgraph_vertex_output_offsets (
https://github.com/rapidsai/cugraph/pull/1354/files/038d2d782363c87ede499d0e8d441e4d091a3dac#diff-dfa26d738fa79bcd814644900db1536e42986eaebed75714da3dad8f993381e4R128)
will become out-of-scope once this function returns; then the memory can be reclaimed and be used elsewhere; but operations using subgraph_vertex_output_offsets will not finish till the stream synchronizes; if this happens, it can lead to undefined behaviors.

Need to double check, but this may not be true.
deallocate() is submitted using a stream (
https://github.com/rapidsai/rmm/blob/branch-0.18/include/rmm/device_buffer.hpp#L422); so actual memory reclamation may not happen till all the previous operations on the stream finishes. I will double check this with RMM folks and make a fix if this adds unnecessary stream synchronization.

OK, this is not necessary, I will remove this (and I did something like this in many places... so I need to remove all those).

afender · 2021-01-22T21:42:06Z

cpp/src/experimental/induced_subgraph.cu

+
+// explicit instantiation
+
+template std::tuple<rmm::device_uvector<int32_t>,


Shall we add 64b edges and double weights instantiations? I already accounted for these in egonet.
It should be just about instantiating them right?

Yes, so what type combinations do you need?; we need to include all the types we actually use (for the obvious reason) but better avoid instantiating for any types we don't use (as this increases compile-time and binary size).

for (vertex_t, edge_t, weight_t) triplets, (int32_t, int32_t, float), (int32_t, int64_t, float), (int64_t, int64_t, float) must be instantiated I guess, and are we actually using double weights?

Yes, I have instantiated these in Egonet: https://github.com/afender/cugraph/blob/7844fa4c35500fb85d9f38a9b2f74d640684fc9b/cpp/src/community/egonet.cu#L128
For the double weights, it is a good discussion. I don't think it is motivated by this algo but since the graph and other algos accept it, we should probably instantiate it here as well.

afender · 2021-01-22T21:46:54Z

cpp/include/experimental/graph_functions.hpp

@@ -243,5 +243,49 @@ void relabel(raft::handle_t const& handle,
             vertex_t num_labels,
             bool do_expensive_check = false);

+/**


How do you pick what goes in algorithms.hpp and what goes in graph_functions.hpp?

So, the current guideline (for myself) is to place graph analytics (e.g. PageRank, BFS, ...) in the existing algorithms.hpp while providing operations on graphs (but that does not modify the graph object, ones modifying the graph object needs to be a member function) in the graph_functions.hpp.

Do you have better suggestions for header file namings and where to put which?

I can see how there's a thin and somewhat subjective boundary between graph analytics algos and operations on a graph that do not modify it. We should identify if there's a strong benefit to C++ API users (since it is an exposed header) and go from there.

afender · 2021-01-22T21:55:14Z

cpp/src/experimental/induced_subgraph.cu

+
+    matrix_partition_device_t<graph_view_t<vertex_t, edge_t, weight_t, store_transposed, multi_gpu>>
+      matrix_partition(graph_view, 0);
+    thrust::transform(


The two large thrust transforms calls in this file could come with some more explanation to facilitate future maintenance.

codecov-io · 2021-01-22T22:53:49Z

Codecov Report

Merging #1354 (a46f863) into branch-0.18 (2fb0725) will increase coverage by 0.33%.
The diff coverage is 57.84%.

@@               Coverage Diff               @@
##           branch-0.18    #1354      +/-   ##
===============================================
+ Coverage        60.38%   60.71%   +0.33%     
===============================================
  Files               67       67              
  Lines             3029     3060      +31     
===============================================
+ Hits              1829     1858      +29     
- Misses            1200     1202       +2

Impacted Files	Coverage Δ
python/cugraph/centrality/__init__.py	`100.00% <ø> (ø)`
python/cugraph/comms/comms.py	`34.52% <25.00%> (ø)`
python/cugraph/dask/common/input_utils.py	`23.07% <28.57%> (+1.14%)`	⬆️
python/cugraph/utilities/utils.py	`67.18% <35.71%> (-4.37%)`	⬇️
python/cugraph/dask/common/mg_utils.py	`37.50% <38.09%> (-2.50%)`	⬇️
python/cugraph/community/spectral_clustering.py	`72.54% <38.46%> (-11.67%)`	⬇️
python/cugraph/structure/number_map.py	`58.12% <50.00%> (+2.16%)`	⬆️
python/cugraph/structure/graph.py	`68.75% <76.47%> (+1.95%)`	⬆️
python/cugraph/__init__.py	`100.00% <100.00%> (ø)`
...ython/cugraph/centrality/betweenness_centrality.py	`100.00% <100.00%> (ø)`
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20d2a5b...a46f863. Read the comment docs.

seunghwak · 2021-01-25T17:50:44Z

Would be great to see some perf comparison against the COO one and profile.
I am particularly curious about how parts like the big thrust::for_each and its nested thrust calls play out.

I haven't rigorously compared the performance with the approach scanning the entire edge list, but extracting three unweighted subgraphs of size (300, 20, 400) vertices from lournal-2008.mtx took 1.7 ms and three weighted subgraphs of size (9130, 1200, 300) vertices from ljournal-2008.mtx took 4.5 ms, so this is running faster with smaller sub-graphs (I assume it will take longer if we scan the entire set of edges).

The biggest performance issue with the current implementation is dealing with power-law graphs with wide variations in vertex degrees, but this is a recurring issue in many implementations in the experimental space, and I plan to address these all at once in a separate PR.

And let me know if this becomes a performance bottleneck in your egonet testing.

seunghwak · 2021-01-26T16:14:30Z

@afender I think I addressed all your comments, but let me know if you have any remaining concerns.

Iroy30 and others added 25 commits November 23, 2020 11:45

enable multigraph

2317349

add tests and fixes

a0eb1bb

add graph from multigraph functionality

3ae324d

update test, add changelog

13ec762

style

396caae

merge conflicts

48fc758

update symmetrize

b6fa160

add initial API for induced subgraph

d1405dd

add induced_subgraph.cu to CMakeLists.txt

4b1c86d

induced subgraph error check

f7a9d3d

Merge branch 'branch-0.18' of github.com:rapidsai/cugraph into fea_in…

a0da124

…duced_subgraph

tmp commit-induced subgraph implementation

c1c30b4

review changes

655e1ed

resolve conflict

405d958

resolve merge conflicts

0f53924

Add doc

8a5bdc3

initial implementation of extract_induced_subgraph

2280295

Merge pull request rapidsai#1280 from Iroy30/add_multigraph

da66ecf

[REVIEW] enable multigraph

Merge pull request rapidsai#1350 from rapidsai/branch-0.17

20d2a5b

[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]

fixed C++ test naming mistakes

ec0d6f4

rename extract_induced_subgraph to extract_induced_subgraph"s"

42a886c

Merge branch 'branch-0.18' of github.com:rapidsai/cugraph into fea_in…

92b0e5d

…duced_subgraph

add induced sugraph test

cd13562

bug fixes

5e7c8dc

add test cases for untrasposed graphs

038d2d7

seunghwak requested review from a team as code owners January 22, 2021 20:52

seunghwak added 3 - Ready for Review labels Jan 22, 2021

seunghwak added non-breaking Non-breaking change feature request New feature or request labels Jan 22, 2021

afender reviewed Jan 22, 2021

View reviewed changes

seunghwak added 3 commits January 25, 2021 14:46

remove unnecessary cudaStreamSynchronize calls

38f1c67

add comments to induced_subgraph.cu

67273d1

add additional explicit instantiations

a46f863

afender approved these changes Jan 26, 2021

View reviewed changes

raydouglass force-pushed the branch-0.18 branch from 20d2a5b to 4f35bcb Compare January 26, 2021 17:54

seunghwak requested review from a team as code owners January 26, 2021 17:54

afender approved these changes Jan 26, 2021

View reviewed changes

BradReesWork added this to the 0.18 milestone Jan 26, 2021

BradReesWork merged commit 9820990 into rapidsai:branch-0.18 Jan 26, 2021

seunghwak deleted the fea_induced_subgraph branch June 24, 2021 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement induced subgraph extraction (SG C++) #1354

Implement induced subgraph extraction (SG C++) #1354

seunghwak commented Jan 22, 2021 •

edited

Loading

afender left a comment

afender Jan 22, 2021

seunghwak Jan 25, 2021

seunghwak Jan 25, 2021

afender Jan 22, 2021

seunghwak Jan 25, 2021

afender Jan 25, 2021

afender Jan 22, 2021

seunghwak Jan 25, 2021

afender Jan 25, 2021

afender Jan 22, 2021

codecov-io commented Jan 22, 2021 •

edited

Loading

seunghwak commented Jan 25, 2021 •

edited

Loading

seunghwak commented Jan 26, 2021


		// explicit instantiation

		template std::tuple<rmm::device_uvector<int32_t>,

Implement induced subgraph extraction (SG C++) #1354

Implement induced subgraph extraction (SG C++) #1354

Conversation

seunghwak commented Jan 22, 2021 • edited Loading

afender left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jan 22, 2021 • edited Loading

Codecov Report

seunghwak commented Jan 25, 2021 • edited Loading

seunghwak commented Jan 26, 2021

seunghwak commented Jan 22, 2021 •

edited

Loading

codecov-io commented Jan 22, 2021 •

edited

Loading

seunghwak commented Jan 25, 2021 •

edited

Loading