[REVIEW]Optimize cugraph-DGL csc codepath #3977

VibhuJawa · 2023-11-04T05:42:57Z

This PR optimizes cugraph-DGL csc codepath and adds an end to end benchmark using cugraph-dgl.

sampled_dir = "/raid/vjawa/nov_1_bulksampling_benchmarks/ogbn_papers100M[2]_b512_f[10, 10, 10]"
dataset = HomogenousBulkSamplerDataset(meta_json_d["total_num_nodes"], edge_dir=edge_dir, sparse_format="csc", return_type="cugraph_dgl.nn.SparseGraph")
dataset.set_input_files(input_directory=sampled_dir+"/samples")
dataloader = torch.utils.data.DataLoader(dataset, collate_fn=lambda x:x, shuffle=False, num_workers=0, batch_size=None)

def run(dataloader):
    for input_nodes, output_nodes, blocks in dataloader: 
       pass

%%timeit
run(dataloader)

With PR :

2.48 s ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

MAIN:

%%timeit
9.52 s ± 151 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

E2E Benchmarks:

 python3 cugraph_dgl_benchmark.py

PR:

...
Epoch time = 70.41 seconds
Time to create MFG = 3.37 seconds
Time analysis for fanout = [10, 10, 10], batch_size = 512
mfg_creation_time_per_epoch = 3.37 seconds
feature_time_per_epoch = 44.09 seconds
m_fwd_time_per_epoch = 7.31 seconds
m_bkwd_time_per_epoch = 15.60 seconds

MAIN:

....
Epoch time = 84.72 seconds
Time to create MFG = 10.79 seconds
Time analysis for fanout = [10, 10, 10], batch_size = 512
mfg_creation_time_per_epoch = 10.79 seconds
feature_time_per_epoch = 47.09 seconds
m_fwd_time_per_epoch = 8.24 seconds
m_bkwd_time_per_epoch = 18.58 seconds

copy-pr-bot · 2023-11-04T05:43:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

VibhuJawa · 2023-11-04T17:21:04Z

/ok to test

…awa/cugraph into optimize_dgl_csc_codepath

VibhuJawa · 2023-11-06T16:21:40Z

python/cugraph-dgl/cugraph_dgl/dataloading/utils/sampling_helpers.py

+    # Note: We transfer tensors to CPU here to avoid the overhead of
+    # transferring them in each iteration of the for loop below.
+    major_offsets_cpu = major_offsets.to("cpu").numpy()
+    label_hop_offsets_cpu = label_hop_offsets.to("cpu").numpy()


@tingyu66 , This is the main optimization because transferring b/w CPU ->GPU 1 tensor at a time was slow.

Thanks for resolving the .item() overhead. 👍

VibhuJawa · 2023-11-06T16:25:07Z

benchmarks/cugraph/standalone/bulk_sampling/cugraph_bulk_sampling.py

@@ -22,7 +22,6 @@
    get_allocation_counts_dask_lazy,


@oorliu ,

The only dgl specific args for our benchmarking efforts are:

--reverse_edges \ --sampling_target_framework cugraph_dgl_csr

rlratzel · 2023-11-06T16:55:44Z

/ok to test

benchmarks/cugraph/standalone/bulk_sampling/cugraph_bulk_sampling.py

…ing.py Co-authored-by: Alex Barghi <[email protected]>

alexbarghi-nv

👍

tingyu66 · 2023-11-06T19:23:57Z

benchmarks/cugraph/standalone/bulk_sampling/cugraph_bulk_sampling.py

+    else:
+        # FIXME: Update these arguments when CSC mode is fixed in cuGraph-PyG (release 24.02)
+        sampling_kwargs = {
+          "deduplicate_sources": True,
+          "prior_sources_behavior": "exclude",
+          "renumber": True,
+          "compression": "COO",
+          "compress_per_hop": False,
+          "use_legacy_names": False,
+          "include_hop_column": True
+        }


Does this setting also work for cugraph-dgl COO code path?

I will have to test, I just focussed on CSC code path to ensure we have success there (Given it is the fastest one) but I dont see why it will not work.

But prior_sources_behavior needs to be True for dgl, right? I think COO path would be useful in the future for debugging purposes.

@tingyu66 , I have not spent time exploring COO code path, do you think I should focus on it , do we expect it to have equivalent speed (maybe after optimizations) ?

I will have to add something like sampling_target_framework=='cugraph_dgl_coo' with prior_sources_behavior =True . The above path is for cugraph-pyG .

I don't think getting COO up to speed is a priority, but it would be useful to include/document the parameter combinations needed for COO path, unless we no longer support it.

I will add that as a followup:

Filed #3981 to track.

alexbarghi-nv · 2023-11-06T20:28:53Z

/merge

BradReesWork · 2023-11-07T20:32:54Z

/ok to test

alexbarghi-nv · 2023-11-07T22:20:31Z

/ok to test

VibhuJawa added 2 commits November 3, 2023 17:22

optimize_dgl_csc_codepath

c8536d5

optimize_dgl_csc_codepath

30b346f

Merge branch 'branch-23.12' into optimize_dgl_csc_codepath

e5509e8

VibhuJawa added this to the 23.12 milestone Nov 4, 2023

VibhuJawa added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 4, 2023

VibhuJawa added 3 commits November 4, 2023 13:48

Add cugraph_dgl_benchmark

a51ab95

Merge branch 'optimize_dgl_csc_codepath' of https://github.com/VibhuJ…

5bc98d7

…awa/cugraph into optimize_dgl_csc_codepath

optimize_dgl_csc_codepath

9eb946f

VibhuJawa requested a review from tingyu66 November 4, 2023 22:18

VibhuJawa changed the title ~~[WIP]Optimize cugraph-DGL csc codepath~~ [REVIEW]Optimize cugraph-DGL csc codepath Nov 4, 2023

VibhuJawa marked this pull request as ready for review November 4, 2023 22:20

VibhuJawa requested a review from a team as a code owner November 4, 2023 22:20

Remove reset_index

6d41e3e

VibhuJawa commented Nov 6, 2023

View reviewed changes

Add arguments for cugraph_dgl_csr_sampling

4e12e1b

VibhuJawa commented Nov 6, 2023

View reviewed changes

Merge branch 'branch-23.12' into optimize_dgl_csc_codepath

f55925c

VibhuJawa commented Nov 6, 2023

View reviewed changes

benchmarks/cugraph/standalone/bulk_sampling/cugraph_bulk_sampling.py Outdated Show resolved Hide resolved

Update benchmarks/cugraph/standalone/bulk_sampling/cugraph_bulk_sampl…

25412c0

…ing.py Co-authored-by: Alex Barghi <[email protected]>

alexbarghi-nv approved these changes Nov 6, 2023

View reviewed changes

tingyu66 reviewed Nov 6, 2023

View reviewed changes

tingyu66 approved these changes Nov 6, 2023

View reviewed changes

VibhuJawa mentioned this pull request Nov 6, 2023

[FEA]: Add cugraph dgl COO path to benchmarking script #3981

Closed

Merge branch 'branch-23.12' into optimize_dgl_csc_codepath

9ac0959

Style fixes

e996ce9

rapids-bot bot merged commit 663d95f into rapidsai:branch-23.12 Nov 8, 2023
70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW]Optimize cugraph-DGL csc codepath #3977

[REVIEW]Optimize cugraph-DGL csc codepath #3977

VibhuJawa commented Nov 4, 2023 •

edited

Loading

copy-pr-bot bot commented Nov 4, 2023

VibhuJawa commented Nov 4, 2023

VibhuJawa Nov 6, 2023

tingyu66 Nov 6, 2023

VibhuJawa Nov 6, 2023

rlratzel commented Nov 6, 2023

alexbarghi-nv left a comment

tingyu66 Nov 6, 2023

VibhuJawa Nov 6, 2023

tingyu66 Nov 6, 2023

VibhuJawa Nov 6, 2023

VibhuJawa Nov 6, 2023

tingyu66 Nov 6, 2023

VibhuJawa Nov 6, 2023

alexbarghi-nv commented Nov 6, 2023

BradReesWork commented Nov 7, 2023

alexbarghi-nv commented Nov 7, 2023

[REVIEW]Optimize cugraph-DGL csc codepath #3977

[REVIEW]Optimize cugraph-DGL csc codepath #3977

Conversation

VibhuJawa commented Nov 4, 2023 • edited Loading

E2E Benchmarks:

copy-pr-bot bot commented Nov 4, 2023

VibhuJawa commented Nov 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rlratzel commented Nov 6, 2023

alexbarghi-nv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexbarghi-nv commented Nov 6, 2023

BradReesWork commented Nov 7, 2023

alexbarghi-nv commented Nov 7, 2023

VibhuJawa commented Nov 4, 2023 •

edited

Loading