Skip to content

Commit

Permalink
[BUG] Check if Dask has quit to avoid throwing an exception and trigg…
Browse files Browse the repository at this point in the history
…ering a segfault on ddp exit (#3961)

Currently, when training with ddp, if dask exits before the `CuGraphStore` is cleaned up, an exception is thrown, which causes ddp to quit with an error, which then causes a segfault, making users think that the workflow has failed when it has actually succeeded.  This bug gracefully displays a warning if the dask dataset can't be deleted, which resolves this issue.

Authors:
  - Alex Barghi (https://github.com/alexbarghi-nv)

Approvers:
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Tingyu Wang (https://github.com/tingyu66)
  - Rick Ratzel (https://github.com/rlratzel)

URL: #3961
  • Loading branch information
alexbarghi-nv authored Nov 1, 2023
1 parent 0a90563 commit 5c0bc8a
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion python/cugraph-pyg/cugraph_pyg/data/cugraph_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,13 @@ def __init__(
def __del__(self):
if self.__is_graph_owner:
if isinstance(self.__graph._plc_graph, dict):
distributed.get_client().unpublish_dataset("cugraph_graph")
try:
distributed.get_client().unpublish_dataset("cugraph_graph")
except TypeError:
warnings.warn(
"Could not unpublish graph dataset, most likely because"
" dask has already shut down."
)
del self.__graph

def __make_offsets(self, input_dict):
Expand Down

0 comments on commit 5c0bc8a

Please sign in to comment.