You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are three separate requests here. I can break these up into three separate issues if that's better:
Do not require num_edges
The python/dask code currently has to compute/persist the dask_cudf dataframe in order to compute the total number of edges the PLC API requires for creating a graph. This should not be necessary since each array already has the size known internally. Not requiring num_edges would eliminate the need to do the somewhat expensive dask compute/persist call.
NOTE: adding this feature is related to the fix done for this issue
Allow multiple src/dst arrays
In order to ease the burden on the cugraph dask/python code for handling dask_cudf input with multiple partitions per worker, the PLC, C, and possibly C++ APIs could accept multiple src and dst vertex arrays. This would allow the dask/python layer to pass the src/dst arrays from each partition as-is, instead of combining each partition's arrays in python in order to pass them as a single src and single dst array to PLC/C.
Implement a "move" option to transfer array ownership
Another improvement could be to allow the PLC/C/C++ layers to own the src/dst arrays currently maintained in python. This would allow PLC/C/C++ modify and delete the incoming arrays as needed, instead of requiring a copy step to preserve the arrays owned by the user/python layer. This could be a new option, possibly called "move" which would default to False (the current behavior).
The text was updated successfully, but these errors were encountered:
Updating the C API graph creation functions to support the following:
* Add support for isolated vertices
* Add MG optimization to support multiple device arrays per rank as input and concatenate them internally
* Add MG optimization to internally compute the number of edges via allreduce rather than requiring it as an input parameter (this can be expensive to compute in python)
This PR implements these features. Some simple tests exist to check for isolate vertices (by running pagerank which generates a different result if the graph has isolated vertices). A simple test for multiple input arrays exists for the MG case.
Closes#3947Closes#3974
Authors:
- Chuck Hastings (https://github.com/ChuckHastings)
- Naim (https://github.com/naimnv)
Approvers:
- Naim (https://github.com/naimnv)
- Joseph Nke (https://github.com/jnke2016)
- Seunghwa Kang (https://github.com/seunghwak)
URL: #3982
There are three separate requests here. I can break these up into three separate issues if that's better:
Do not require
num_edges
The python/dask code currently has to compute/persist the dask_cudf dataframe in order to compute the total number of edges the PLC API requires for creating a graph. This should not be necessary since each array already has the size known internally. Not requiring
num_edges
would eliminate the need to do the somewhat expensive dask compute/persist call.NOTE: adding this feature is related to the fix done for this issue
Allow multiple src/dst arrays
In order to ease the burden on the cugraph dask/python code for handling dask_cudf input with multiple partitions per worker, the PLC, C, and possibly C++ APIs could accept multiple
src
anddst
vertex arrays. This would allow the dask/python layer to pass the src/dst arrays from each partition as-is, instead of combining each partition's arrays in python in order to pass them as a single src and single dst array to PLC/C.Implement a "move" option to transfer array ownership
Another improvement could be to allow the PLC/C/C++ layers to own the src/dst arrays currently maintained in python. This would allow PLC/C/C++ modify and delete the incoming arrays as needed, instead of requiring a copy step to preserve the arrays owned by the user/python layer. This could be a new option, possibly called "move" which would default to
False
(the current behavior).The text was updated successfully, but these errors were encountered: