Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: No worker found when trying to get the worker from a client.run call #3378

Closed
2 tasks done
jnke2016 opened this issue Mar 28, 2023 · 0 comments · Fixed by #3379
Closed
2 tasks done

[BUG]: No worker found when trying to get the worker from a client.run call #3378

jnke2016 opened this issue Mar 28, 2023 · 0 comments · Fixed by #3379
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@jnke2016
Copy link
Contributor

jnke2016 commented Mar 28, 2023

Version

23.04

Which installation method(s) does this occur on?

Conda, Pip, Source

Describe the bug.

All the MG cuGraph tests are failing with the error mentioned in this issue. In fact, the worker cannot be retrieved within a dask client_run call with the method get_worker()

Minimum reproducible example

Run any MG tests

Relevant log output

Function: _subcomm_init
22: args:     (b'6\x198\xe0\xa3uO\xaa\xadHW%\xc7\xed\xc3\x04', 1)
23: kwargs:   {}
24: Traceback (most recent call last):
25:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 3288, in run
26:     result = function(*args, **kwargs)
27:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 85, in _subcomm_init
28:     handle = get_handle(sID)
29:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 237, in get_handle
30:     sessionstate = get_raft_comm_state(sID, get_worker())
31:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 2714, in get_worker
32:     raise ValueError("No worker found") from None
33: ValueError: No worker found
34: 2023-03-27 03:01:57,560 - distributed.worker - WARNING - Run Failed
35: Function: _subcomm_init
36: args:     (b'6\x198\xe0\xa3uO\xaa\xadHW%\xc7\xed\xc3\x04', 1)
37: kwargs:   {}
38: Traceback (most recent call last):
39:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 3288, in run
40:     result = function(*args, **kwargs)
41:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 85, in _subcomm_init
42:     handle = get_handle(sID)
43:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 237, in get_handle
44:     sessionstate = get_raft_comm_state(sID, get_worker())
45:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 2714, in get_worker
46:     raise ValueError("No worker found") from None
47: ValueError: No worker found
48: 2023-03-27 03:01:57,803 - distributed.worker - WARNING - Run Failed
49: Function: _subcomm_init
50: args:     (b'6\x198\xe0\xa3uO\xaa\xadHW%\xc7\xed\xc3\x04', 1)
51: kwargs:   {}
52: Traceback (most recent call last):
53:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 3288, in run
54:     result = function(*args, **kwargs)
55:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 85, in _subcomm_init
56:     handle = get_handle(sID)
57:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/cugraph/dask/comms/comms.py", line 237, in get_handle
58:     sessionstate = get_raft_comm_state(sID, get_worker())
59:   File "/gpfs/fs1/projects/sw_rapids/users/jnke/miniconda3/envs/cugraph_test/lib/python3.8/site-packages/distributed/worker.py", line 2714, in get_worker
60:     raise ValueError("No worker found") from None
61: ValueError: No worker found
62: ERROR

Environment details

No response

Other/Misc.

No response

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report
@jnke2016 jnke2016 added ? - Needs Triage Need team to review and classify bug Something isn't working labels Mar 28, 2023
rapids-bot bot pushed a commit that referenced this issue Mar 28, 2023
A Dask PR which was merged last week changed the way workers are retrieved from client calls. This [PR](#3361) assumed that the worker could be retrieve through `get_client()` and since the Dask version was pinned in RAFT to no break CI, it temporarily worked. But after the `dask `version was unpinned, this resulted in the issue this PR is closing. 

This PR leverages the `dask_worker` input argument which is populated with the worker when calling `client.run`.

closes #3378

Authors:
  - Joseph Nke (https://github.com/jnke2016)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Brad Rees (https://github.com/BradReesWork)

URL: #3379
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant