Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]Optimize cugraph-DGL csc codepath #3977

Merged
merged 12 commits into from
Nov 8, 2023

Conversation

VibhuJawa
Copy link
Member

@VibhuJawa VibhuJawa commented Nov 4, 2023

This PR optimizes cugraph-DGL csc codepath and adds an end to end benchmark using cugraph-dgl.

sampled_dir = "/raid/vjawa/nov_1_bulksampling_benchmarks/ogbn_papers100M[2]_b512_f[10, 10, 10]"
dataset = HomogenousBulkSamplerDataset(meta_json_d["total_num_nodes"], edge_dir=edge_dir, sparse_format="csc", return_type="cugraph_dgl.nn.SparseGraph")
dataset.set_input_files(input_directory=sampled_dir+"/samples")
dataloader = torch.utils.data.DataLoader(dataset, collate_fn=lambda x:x, shuffle=False, num_workers=0, batch_size=None)

def run(dataloader):
    for input_nodes, output_nodes, blocks in dataloader: 
       pass
%%timeit
run(dataloader)

With PR :

2.48 s ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

MAIN:

%%timeit
9.52 s ± 151 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

E2E Benchmarks:

 python3 cugraph_dgl_benchmark.py

PR:

...
Epoch time = 70.41 seconds
Time to create MFG = 3.37 seconds
Time analysis for fanout = [10, 10, 10], batch_size = 512
mfg_creation_time_per_epoch = 3.37 seconds
feature_time_per_epoch = 44.09 seconds
m_fwd_time_per_epoch = 7.31 seconds
m_bkwd_time_per_epoch = 15.60 seconds

MAIN:

....
Epoch time = 84.72 seconds
Time to create MFG = 10.79 seconds
Time analysis for fanout = [10, 10, 10], batch_size = 512
mfg_creation_time_per_epoch = 10.79 seconds
feature_time_per_epoch = 47.09 seconds
m_fwd_time_per_epoch = 8.24 seconds
m_bkwd_time_per_epoch = 18.58 seconds

Copy link

copy-pr-bot bot commented Nov 4, 2023

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@VibhuJawa VibhuJawa added this to the 23.12 milestone Nov 4, 2023
@VibhuJawa VibhuJawa added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 4, 2023
@VibhuJawa
Copy link
Member Author

/ok to test

@VibhuJawa VibhuJawa requested a review from tingyu66 November 4, 2023 22:18
@VibhuJawa VibhuJawa changed the title [WIP]Optimize cugraph-DGL csc codepath [REVIEW]Optimize cugraph-DGL csc codepath Nov 4, 2023
@VibhuJawa VibhuJawa marked this pull request as ready for review November 4, 2023 22:20
@VibhuJawa VibhuJawa requested a review from a team as a code owner November 4, 2023 22:20
Comment on lines +479 to +482
# Note: We transfer tensors to CPU here to avoid the overhead of
# transferring them in each iteration of the for loop below.
major_offsets_cpu = major_offsets.to("cpu").numpy()
label_hop_offsets_cpu = label_hop_offsets.to("cpu").numpy()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tingyu66 , This is the main optimization because transferring b/w CPU ->GPU 1 tensor at a time was slow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for resolving the .item() overhead. 👍

@@ -22,7 +22,6 @@
get_allocation_counts_dask_lazy,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oorliu ,

The only dgl specific args for our benchmarking efforts are:

                --reverse_edges \
                --sampling_target_framework cugraph_dgl_csr

@rlratzel
Copy link
Contributor

rlratzel commented Nov 6, 2023

/ok to test

Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 578 to 588
else:
# FIXME: Update these arguments when CSC mode is fixed in cuGraph-PyG (release 24.02)
sampling_kwargs = {
"deduplicate_sources": True,
"prior_sources_behavior": "exclude",
"renumber": True,
"compression": "COO",
"compress_per_hop": False,
"use_legacy_names": False,
"include_hop_column": True
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this setting also work for cugraph-dgl COO code path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to test, I just focussed on CSC code path to ensure we have success there (Given it is the fastest one) but I dont see why it will not work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But prior_sources_behavior needs to be True for dgl, right? I think COO path would be useful in the future for debugging purposes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tingyu66 , I have not spent time exploring COO code path, do you think I should focus on it , do we expect it to have equivalent speed (maybe after optimizations) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to add something like sampling_target_framework=='cugraph_dgl_coo' with prior_sources_behavior =True . The above path is for cugraph-pyG .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getting COO up to speed is a priority, but it would be useful to include/document the parameter combinations needed for COO path, unless we no longer support it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add that as a followup:

Filed #3981 to track.

@alexbarghi-nv
Copy link
Member

/merge

@BradReesWork
Copy link
Member

/ok to test

@alexbarghi-nv
Copy link
Member

/ok to test

@rapids-bot rapids-bot bot merged commit 663d95f into rapidsai:branch-23.12 Nov 8, 2023
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants