-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble using get_dask_client
with a Coiled cluster
#12971
Comments
the get_dask_client function seems not to work when using an external dask cluster declared with DaskTaskRunner(address=data_cluster_address). I get an error "No global client found and no address provided". Below is one of the examples in README.md slightly adapted to run on an external dask cluster. I investigated a bit and neither get_worker nor get_client work. It seems that the task 'process_data' is unable to know that it's running in a dask worker. I suspect this is related to other problems with external dask cluster, such as PrefectHQ/prefect-dask#47 I hacked prefect_dask.utils._generate_client_kwargs to use client_kwargs['address'] instead of calling get_client().scheduler.address, and it works now. I can make a PR. I think that scheduling task with prefect_dask on a cluster, and running calculations with the same cluster is a useful pattern to deal with memory problems, and to monitor what happens on the cluster with the dask dashboard. Though, I'm open to other options, because I've been stuck on this for months...
|
I briefly peeked into the implementation and noticed that the get_dask_client method is initializing a new client every time. Dask offers an API that tries to use already existing clients and ensure the task cannot deadlock, see https://distributed.dask.org/en/stable/api.html#distributed.worker_client and https://distributed.dask.org/en/stable/task-launch.html It's strongly recommend to use the |
Hey @fjetter thanks for taking a look! I think we weren't using the |
FWIW I think there's an issue here for async worker_client: dask/distributed#5513. |
I tried running this example again using import dask
from prefect import flow, task
from prefect_dask import DaskTaskRunner, get_dask_client
from distributed import worker_client
coiled_runner = DaskTaskRunner(
cluster_class="coiled.Cluster",
cluster_kwargs={
"name": "get-dask-client",
},
)
@task
def compute_task():
# with get_dask_client() as client:
with worker_client() as client:
df = dask.datasets.timeseries("2000", "2001", partition_freq="4w")
summary_df = df.describe().compute()
return summary_df
@flow(task_runner=coiled_runner)
def dask_flow():
prefect_future = compute_task.submit()
return prefect_future.result()
if __name__ == "__main__":
dask_flow() @madkinsz do you see something similar? Is there a reason to still prefer |
@jrbourbeau I think we're just providing from prefect import flow, task
from prefect_dask import DaskTaskRunner, get_async_dask_client
import distributed
@task
async def atest_task():
async with get_async_dask_client() as client:
await client.submit(lambda x: x, 1)
with distributed.worker_client() as client:
await client.submit(lambda x: x, 1)
# AttributeError: 'int' object has no attribute '__await__'
@flow(task_runner=DaskTaskRunner())
def test_flow():
atest_task.submit()
if __name__ == "__main__":
test_flow() |
Hi, I just wanted to mention that I am hitting a similar problem while trying to instantiate a Here is my code: import dask
from prefect import flow, task
from prefect_dask import DaskTaskRunner
@task
def compute_task():
with get_dask_client() as client:
df = dask.datasets.timeseries("2000", "2001", partition_freq="4w")
summary_df = client.compute(df.describe())
return summary_df
@flow(task_runner=DaskTaskRunner(address="gateway://pccompute-dask.westeurope.cloudapp.azure.com:80/prod.530b7bd7e4164670bac990661b5fbdc4"))
def dask_flow():
prefect_future = compute_task.submit()
return prefect_future.result()
if __name__ == "__main__":
dask_flow()
Output: 16:55:35.269 | INFO | prefect.engine - Created flow run 'papaya-mouflon' for flow 'dask-flow'
16:55:35.270 | INFO | prefect.task_runner.dask - Connecting to an existing Dask cluster at gateway://pccompute-dask.westeurope.cloudapp.azure.com:80/prod.530b7bd7e4164670bac990661b5fbdc4
16:55:35.271 | ERROR | Flow run 'papaya-mouflon' - Crash detected! Execution was interrupted by an unexpected exception: TypeError: Gateway expects a `ssl_context` argument of type ssl.SSLContext, instead got None
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/test_dask_runner.py", line 43, in <module>
dask_flow()
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/flows.py", line 540, in __call__
return enter_flow_run_engine_from_flow_call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/engine.py", line 272, in enter_flow_run_engine_from_flow_call
retval = from_sync.wait_for_call_in_loop_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/api.py", line 243, in wait_for_call_in_loop_thread
return call.result()
^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 283, in result
return self.future.result(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 169, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/_internal/concurrency/calls.py", line 346, in _run_async
result = await coro
^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/client/utilities.py", line 51, in with_injected_client
return await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/engine.py", line 364, in create_then_begin_flow_run
state = await begin_flow_run(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/engine.py", line 509, in begin_flow_run
flow_run_context.task_runner = await stack.enter_async_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 638, in enter_async_context
result = await _enter(cm)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 204, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect/task_runners.py", line 185, in start
await self._start(exit_stack)
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/prefect_dask/task_runners.py", line 331, in _start
self._client = await exit_stack.enter_async_context(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/[email protected]/3.11.4/Frameworks/Python.framework/Versions/3.11/lib/python3.11/contextlib.py", line 638, in enter_async_context
result = await _enter(cm)
^^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/distributed/client.py", line 1457, in __aenter__
await self
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/distributed/client.py", line 1268, in _start
await self._ensure_connected(timeout=timeout)
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/distributed/client.py", line 1331, in _ensure_connected
comm = await connect(
^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/distributed/comm/core.py", line 292, in connect
comm = await wait_for(
^^^^^^^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/distributed/utils.py", line 1807, in wait_for
return await fut
^^^^^^^^^
File "/Users/giorgio/Projects/personal/prefect-planetary-computer/.venv/lib/python3.11/site-packages/dask_gateway/comm.py", line 39, in connect
raise TypeError(
TypeError: Gateway expects a `ssl_context` argument of type ssl.SSLContext, instead got None EDIT: |
Hi! I noticed recently that
get_dask_client
(added in PrefectHQ/prefect-dask#33) is not working with a Coiled cluster. I'm using Prefect Cloud.Problem
The following snippet does not work using a Coiled cluster (but does work with
dask.distributed.LocalCluster
):When I run the above snippet I get a
TypeError: TLS expects a
ssl_contextargument of type ssl.SSLContext (perhaps check your TLS configuration?) Instead got None
.Full traceback
Looking at the full traceback, it seems
_expect_tls_context
in https://github.com/dask/distributed/blob/main/distributed/comm/tcp.py expects TLS arguments, but is not getting the information needed.Since
get_dask_client
is a utility aroundworker_client
, I checked to see if usingworker_client
works with a Coiled cluster, and it does:Potential solution?
If I get the
client
the waydask.distributed.worker_client
does, then the aforementioned failing snippet works. ie changing (lines 103-104 in get_dask_client) from:to:
It seems like there are other arguments
get_dask_client
may need, so not saying this exactly will work, but thought it'd be a helpful start.The text was updated successfully, but these errors were encountered: