-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unpin dask
and distributed
for 23.12
development
#5627
Conversation
/merge |
@galipremsagar This now in conflict with dask-cuda: |
Logs indicate an older version of |
@csadorf We have encountered an issue with |
Closing and reopening for CI |
I see Dask errors (e.g., here) as below: ==================================== ERRORS ====================================
________ ERROR at teardown of test_validate_dask_array[4-False-12-8-24] ________
fut = <Future cancelled>, timeout = 0.7971904277801514
async def wait_for(fut, timeout):
"""Wait for the single Future or coroutine to complete, with timeout.
Coroutine will be wrapped in Task.
Returns result of the Future or coroutine. When a timeout occurs,
it cancels the task and raises TimeoutError. To avoid the task
cancellation, wrap it in shield().
If the wait is cancelled, the task is also cancelled.
This function is a coroutine.
"""
loop = events.get_running_loop()
if timeout is None:
return await fut
if timeout <= 0:
fut = ensure_future(fut, loop=loop)
if fut.done():
return fut.result()
await _cancel_and_wait(fut, loop=loop)
try:
return fut.result()
except exceptions.CancelledError as exc:
raise exceptions.TimeoutError() from exc
waiter = loop.create_future()
timeout_handle = loop.call_later(timeout, _release_waiter, waiter)
cb = functools.partial(_release_waiter, waiter)
fut = ensure_future(fut, loop=loop)
fut.add_done_callback(cb)
try:
# wait until the future completes or the timeout
try:
await waiter
except exceptions.CancelledError:
if fut.done():
return fut.result()
else:
fut.remove_done_callback(cb)
# We must ensure that the task is not running
# after wait_for() returns.
# See https://bugs.python.org/issue32751
await _cancel_and_wait(fut, loop=loop)
raise
if fut.done():
return fut.result()
else:
fut.remove_done_callback(cb)
# We must ensure that the task is not running
# after wait_for() returns.
# See https://bugs.python.org/issue32751
await _cancel_and_wait(fut, loop=loop)
# In case task cancellation failed with some
# exception, we should re-raise it
# See https://bugs.python.org/issue40607
try:
> return fut.result()
E asyncio.exceptions.CancelledError
/opt/conda/envs/test/lib/python3.10/asyncio/tasks.py:456: CancelledError
The above exception was the direct cause of the following exception:
@pytest.fixture(scope="module")
def cluster():
cluster = LocalCUDACluster(protocol="tcp", scheduler_port=0)
yield cluster
> cluster.close() It took me a lot of time to figure out what was happening but I did so recently (details in the description of rapidsai/dask-cuda#1260). However, after that PR I've changed that to be less intrusive to non-CI workloads, and to do so we pass Those shouldn't be related to unpinning Dask as they were already happening for some time in Dask-CUDA (perhaps since August or even before that) was that the case in cuML as well @dantegd @csadorf ? |
@pentschev you're correct, we've seen these erros for quite a while. They became way more prominent when we have errors like cuda context creation (for example #5598) but even then they didn't fully go away. We can work on a similar fix to what you did in dask-cuda in a follow up PR. |
That seems appropriate in cuML then. I opened #5636 to increase the timeout, I think I covered all relevant places. |
This PR relaxes
dask
anddistributed
versions pinning for23.12
development.xref: rapidsai/cudf#14320