You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All the MG cuGraph tests are failing with the error mentioned in this issue. In fact, the worker cannot be retrieved within a daskclient_run call with the method get_worker()
Minimum reproducible example
Run any MG tests
Relevant log output
Function: _subcomm_init
22: args: (b'6\x198\xe0\xa3uO\xaa\xadHW%\xc7\xed\xc3\x04', 1)
23: kwargs: {}
24: Traceback (most recent call last):
25: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 3288, in run
26: result = function(*args, **kwargs)
27: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 85, in _subcomm_init
28: handle = get_handle(sID)
29: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 237, in get_handle
30: sessionstate = get_raft_comm_state(sID, get_worker())
31: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 2714, in get_worker
32: raise ValueError("No worker found") from None
33: ValueError: No worker found
34: 2023-03-27 03:01:57,560 - distributed.worker - WARNING - Run Failed
35: Function: _subcomm_init
36: args: (b'6\x198\xe0\xa3uO\xaa\xadHW%\xc7\xed\xc3\x04', 1)
37: kwargs: {}
38: Traceback (most recent call last):
39: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 3288, in run
40: result = function(*args, **kwargs)
41: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 85, in _subcomm_init
42: handle = get_handle(sID)
43: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 237, in get_handle
44: sessionstate = get_raft_comm_state(sID, get_worker())
45: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 2714, in get_worker
46: raise ValueError("No worker found") from None
47: ValueError: No worker found
48: 2023-03-27 03:01:57,803 - distributed.worker - WARNING - Run Failed
49: Function: _subcomm_init
50: args: (b'6\x198\xe0\xa3uO\xaa\xadHW%\xc7\xed\xc3\x04', 1)
51: kwargs: {}
52: Traceback (most recent call last):
53: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 3288, in run
54: result = function(*args, **kwargs)
55: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 85, in _subcomm_init
56: handle = get_handle(sID)
57: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 237, in get_handle
58: sessionstate = get_raft_comm_state(sID, get_worker())
59: File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 2714, in get_worker
60: raise ValueError("No worker found") from None
61: ValueError: No worker found
62: ERROR
Environment details
No response
Other/Misc.
No response
Code of Conduct
I agree to follow cuGraph's Code of Conduct
I have searched the open bugs and have found no duplicates for this bug report
The text was updated successfully, but these errors were encountered:
A Dask PR which was merged last week changed the way workers are retrieved from client calls. This [PR](#3361) assumed that the worker could be retrieve through `get_client()` and since the Dask version was pinned in RAFT to no break CI, it temporarily worked. But after the `dask `version was unpinned, this resulted in the issue this PR is closing.
This PR leverages the `dask_worker` input argument which is populated with the worker when calling `client.run`.
closes#3378
Authors:
- Joseph Nke (https://github.com/jnke2016)
Approvers:
- Rick Ratzel (https://github.com/rlratzel)
- Brad Rees (https://github.com/BradReesWork)
URL: #3379
Version
23.04
Which installation method(s) does this occur on?
Conda, Pip, Source
Describe the bug.
All the MG cuGraph tests are failing with the error mentioned in this issue. In fact, the worker cannot be retrieved within a
dask
client_run
call with the methodget_worker()
Minimum reproducible example
Relevant log output
Environment details
No response
Other/Misc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: