Update PLC, C, (and C++?) APIs to better accommodate python MG graph creation use cases #3947

rlratzel · 2023-10-20T03:50:21Z

There are three separate requests here. I can break these up into three separate issues if that's better:

Do not require `num_edges`

The python/dask code currently has to compute/persist the dask_cudf dataframe in order to compute the total number of edges the PLC API requires for creating a graph. This should not be necessary since each array already has the size known internally. Not requiring num_edges would eliminate the need to do the somewhat expensive dask compute/persist call.

NOTE: adding this feature is related to the fix done for this issue

Allow multiple src/dst arrays

In order to ease the burden on the cugraph dask/python code for handling dask_cudf input with multiple partitions per worker, the PLC, C, and possibly C++ APIs could accept multiple src and dst vertex arrays. This would allow the dask/python layer to pass the src/dst arrays from each partition as-is, instead of combining each partition's arrays in python in order to pass them as a single src and single dst array to PLC/C.

Implement a "move" option to transfer array ownership

Another improvement could be to allow the PLC/C/C++ layers to own the src/dst arrays currently maintained in python. This would allow PLC/C/C++ modify and delete the incoming arrays as needed, instead of requiring a copy step to preserve the arrays owned by the user/python layer. This could be a new option, possibly called "move" which would default to False (the current behavior).

The text was updated successfully, but these errors were encountered:

rlratzel · 2023-10-20T04:04:23Z

cc @VibhuJawa @jnke2016

Updating the C API graph creation functions to support the following: * Add support for isolated vertices * Add MG optimization to support multiple device arrays per rank as input and concatenate them internally * Add MG optimization to internally compute the number of edges via allreduce rather than requiring it as an input parameter (this can be expensive to compute in python) This PR implements these features. Some simple tests exist to check for isolate vertices (by running pagerank which generates a different result if the graph has isolated vertices). A simple test for multiple input arrays exists for the MG case. Closes #3947 Closes #3974 Authors: - Chuck Hastings (https://github.com/ChuckHastings) - Naim (https://github.com/naimnv) Approvers: - Naim (https://github.com/naimnv) - Joseph Nke (https://github.com/jnke2016) - Seunghwa Kang (https://github.com/seunghwak) URL: #3982

rlratzel added the improvement Improvement / enhancement to an existing function label Oct 20, 2023

rlratzel assigned ChuckHastings Oct 20, 2023

rlratzel mentioned this issue Oct 20, 2023

Update python/dask cugraph implementation to use new PLC features from #3947 #3948

Closed

ChuckHastings mentioned this issue Nov 6, 2023

[BUG]: PLC inconsistent handling of isolated nodes; example using WCC #3974

Closed

2 tasks

ChuckHastings mentioned this issue Nov 14, 2023

Update C API graph creation function signatures #3982

Merged

rapids-bot bot closed this as completed in #3982 Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update PLC, C, (and C++?) APIs to better accommodate python MG graph creation use cases #3947

Update PLC, C, (and C++?) APIs to better accommodate python MG graph creation use cases #3947

rlratzel commented Oct 20, 2023 •

edited

Loading

rlratzel commented Oct 20, 2023

Update PLC, C, (and C++?) APIs to better accommodate python MG graph creation use cases #3947

Update PLC, C, (and C++?) APIs to better accommodate python MG graph creation use cases #3947

Comments

rlratzel commented Oct 20, 2023 • edited Loading

Do not require num_edges

Allow multiple src/dst arrays

Implement a "move" option to transfer array ownership

rlratzel commented Oct 20, 2023

rlratzel commented Oct 20, 2023 •

edited

Loading

Do not require `num_edges`