Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky test_AllProgress #6550

Open
gjoseph92 opened this issue Jun 9, 2022 · 0 comments
Open

Flaky test_AllProgress #6550

gjoseph92 opened this issue Jun 9, 2022 · 0 comments
Labels
flaky test Intermittent failures on CI.

Comments

@gjoseph92
Copy link
Collaborator

This feels related to #6361, and possibly could be fixed by #6504, and/or #6427?

https://github.com/dask/distributed/runs/6750812220?check_suite_focus=true#step:11:1754

_______________________________ test_AllProgress _______________________________
args = (), kwds = {}
@wraps(func)
definner(*args, **kwds):
>       withself._recreate_cm():
../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:78: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:142: in __exit__
next(self.gen)
distributed/utils_test.py:1906: in clean
with check_process_leak(check=processes):
../../../miniconda3/envs/dask-distributed/lib/python3.10/contextlib.py:142: in __exit__
next(self.gen)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
check = True, check_timeout = 40, term_timeout = 3
@contextmanager
defcheck_process_leak(
        check: bool = True, check_timeout: float = 40, term_timeout: float = 3
    ):
"""Terminate any currently-running subprocesses at both the beginning and end of this context
    Parameters
    ----------
    check : bool, optional
        If True, raise AssertionError if any processes survive at the exit
    check_timeout: float, optional
        Wait up to these many seconds for subprocesses to terminate before failing
    term_timeout: float, optional
        After sending SIGTERM to a subprocess, wait up to these many seconds before
        sending SIGKILL
    """
        term_or_kill_active_children(timeout=term_timeout)
try:
yield
if check:
                children = wait_active_children(timeout=check_timeout)
>               assertnot children, f"Test leaked subprocesses: {children}"
E               AssertionError: Test leaked subprocesses: [<SpawnProcess name='Dask Worker process (from Nanny)' pid=47619 parent=15412 started daemon>, <SpawnProcess name='Dask Worker process (from Nanny)' pid=47620 parent=15412 started daemon>]
E               assert not [<SpawnProcess name='Dask Worker process (from Nanny)' pid=47619 parent=15412 started daemon>, <SpawnProcess name='Dask Worker process (from Nanny)' pid=47620 parent=15412 started daemon>]
distributed/utils_test.py:1817: AssertionError
----------------------------- Captured stderr call -----------------------------
2022-06-06 06:23:26,122 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55254
2022-06-06 06:23:26,122 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55254
2022-06-06 06:23:26,123 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55255
2022-06-06 06:23:26,123 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:26,123 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:26,123 - distributed.worker - INFO -               Threads:                          1
2022-06-06 06:23:26,123 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:26,123 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-u1jzba_9
2022-06-06 06:23:26,124 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:26,212 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55257
2022-06-06 06:23:26,212 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55257
2022-06-06 06:23:26,212 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55258
2022-06-06 06:23:26,213 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:26,213 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:26,213 - distributed.worker - INFO -               Threads:                          2
2022-06-06 06:23:26,213 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:26,213 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-c8g6zm4s
2022-06-06 06:23:26,214 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:27,131 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:27,132 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:27,133 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:27,157 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:27,157 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:27,158 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:27,787 - distributed.worker - WARNING - Compute Failed
Key:       div-beaac0206246b34d3625d21194e03c13
Function:  div
args:      (1, 0)
kwargs:    {}
Exception: "ZeroDivisionError('division by zero')"
2022-06-06 06:23:30,162 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:55257'.
2022-06-06 06:23:36,051 - distributed.worker - ERROR - Scheduler was unaware of this worker 'tcp://127.0.0.1:55257'. Shutting down.
2022-06-06 06:23:36,056 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55257
2022-06-06 06:23:36,058 - distributed.scheduler - WARNING - Received heartbeat from unregistered worker 'tcp://127.0.0.1:55254'.
2022-06-06 06:23:36,060 - distributed.worker - ERROR - Scheduler was unaware of this worker 'tcp://127.0.0.1:55254'. Shutting down.
2022-06-06 06:23:36,060 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55254
2022-06-06 06:23:36,072 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:36,073 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:36,077 - distributed.nanny - INFO - Worker closed
2022-06-06 06:23:36,078 - distributed.nanny - INFO - Worker closed
2022-06-06 06:23:37,525 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x142dc7430>>, <Task finished name='Task-50018' coro=<Scheduler.restart() done, defined at /Users/runner/work/distributed/distributed/distributed/utils.py:759> exception=CommClosedError("Exception while trying to call remote method 'restart' before comm was established.")>)
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 226, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 897, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 742, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 242, in read
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 150, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) rpc.restart local=tcp://127.0.0.1:55271 remote=tcp://127.0.0.1:55248>: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/scheduler.py", line 5130, in restart
    resps = await asyncio.wait_for(resps, timeout)
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 218, in All
    result = await tasks.next()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 900, in send_recv_from_rpc
    raise type(e)(
distributed.comm.core.CommClosedError: Exception while trying to call remote method 'restart' before comm was established.
2022-06-06 06:23:39,390 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55276
2022-06-06 06:23:39,390 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:55277
2022-06-06 06:23:39,390 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55277
2022-06-06 06:23:39,390 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:55276
2022-06-06 06:23:39,390 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55279
2022-06-06 06:23:39,390 - distributed.worker - INFO -          dashboard at:            127.0.0.1:55278
2022-06-06 06:23:39,390 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:39,390 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:55247
2022-06-06 06:23:39,390 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:39,391 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:39,391 - distributed.worker - INFO -               Threads:                          1
2022-06-06 06:23:39,391 - distributed.worker - INFO -               Threads:                          2
2022-06-06 06:23:39,391 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:39,391 - distributed.worker - INFO -                Memory:                  14.00 GiB
2022-06-06 06:23:39,391 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-x9y623jx
2022-06-06 06:23:39,391 - distributed.worker - INFO -       Local Directory: /var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/tmprdmbzfmm/dask-worker-space/worker-9hrdtqpi
2022-06-06 06:23:39,391 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:39,391 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:40,145 - distributed.client - ERROR - Restart timed out after 10.00 seconds
2022-06-06 06:23:40,753 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:40,753 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:40,754 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:40,754 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:55247
2022-06-06 06:23:40,754 - distributed.worker - INFO - -------------------------------------------------
2022-06-06 06:23:40,755 - distributed.core - INFO - Starting established connection
2022-06-06 06:23:41,289 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55276
2022-06-06 06:23:41,290 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:41,290 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:55277
2022-06-06 06:23:41,291 - distributed.worker - INFO - Connection to scheduler broken. Closing without reporting.  Status: Status.closing
2022-06-06 06:23:41,291 - distributed.batched - INFO - Batched Comm Closed <TCP (closed) Worker->Scheduler local=tcp://127.0.0.1:55280 remote=tcp://127.0.0.1:55247>
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/batched.py", line 94, in _background_send
    nbytes = yield self.comm.write(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 269, in write
    raise CommClosedError()
distributed.comm.core.CommClosedError
2022-06-06 06:23:41,292 - distributed.batched - INFO - Batched Comm Closed <TCP (closed) Worker->Scheduler local=tcp://127.0.0.1:55281 remote=tcp://127.0.0.1:55247>
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/batched.py", line 94, in _background_send
    nbytes = yield self.comm.write(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 269, in write
    raise CommClosedError()
distributed.comm.core.CommClosedError
Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x138321150>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s
2022-06-06 06:23:46,527 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x10e665180>>, <Task finished name='Task-12' coro=<Worker.close() done, defined at /Users/runner/work/distributed/distributed/distributed/utils.py:759> exception=OSError('Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s')>)
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x138321150>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55249 after 5 s
Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x12a0f0070>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s
[2022](https://github.com/dask/distributed/runs/6750812220?check_suite_focus=true#step:11:2023)-06-06 06:23:46,712 - tornado.application - ERROR - Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOLoop object at 0x12a049240>>, <Task finished name='Task-12' coro=<Worker.close() done, defined at /Users/runner/work/distributed/distributed/distributed/utils.py:759> exception=OSError('Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s')>)
ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 289, in connect
    comm = await asyncio.wait_for(
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 451, in connect
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 148, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc.__class__.__name__}: {exc}") from exc
distributed.comm.core.CommClosedError: in <distributed.comm.tcp.TCPConnector object at 0x12a0f0070>: ConnectionRefusedError: [Errno 61] Connection refused
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 761, in wrapper
    return await func(*args, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/worker.py", line 1529, in close
    await r.close_gracefully()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 975, in send_recv_from_rpc
    comm = await self.pool.connect(self.addr)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1196, in connect
    return await connect_attempt
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 1132, in _connect
    comm = await connect(
  File "/Users/runner/work/distributed/distributed/distributed/comm/core.py", line 315, in connect
    raise OSError(
OSError: Timed out trying to connect to tcp://127.0.0.1:55248 after 5 s
2022-06-06 06:23:46,812 - distributed.worker - INFO - Timed out while trying to connect during heartbeat
2022-06-06 06:23:46,999 - distributed.worker - INFO - Timed out while trying to connect during heartbeat
------------------------------ Captured log call -------------------------------
ERROR    asyncio:base_events.py:1744 Future exception was never retrieved
future: <Future finished exception=CommClosedError("Exception while trying to call remote method 'restart' before comm was established.")>
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 226, in read
    frames_nbytes = await stream.read_bytes(fmt_size)
tornado.iostream.StreamClosedError: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 897, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 742, in send_recv
    response = await comm.read(deserializers=deserializers)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 242, in read
    convert_stream_closed_error(self, e)
  File "/Users/runner/work/distributed/distributed/distributed/comm/tcp.py", line 150, in convert_stream_closed_error
    raise CommClosedError(f"in {obj}: {exc}") from exc
distributed.comm.core.CommClosedError: in <TCP (closed) rpc.restart local=tcp://127.0.0.1:55270 remote=tcp://127.0.0.1:55249>: Stream is closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 769, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/Users/runner/work/distributed/distributed/distributed/utils.py", line 231, in quiet
    yield task
  File "/Users/runner/miniconda3/envs/dask-distributed/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/Users/runner/work/distributed/distributed/distributed/core.py", line 900, in send_recv_from_rpc
    raise type(e)(
distributed.comm.core.CommClosedError: Exception while trying to call remote method 'restart' before comm was established.
@gjoseph92 gjoseph92 added the flaky test Intermittent failures on CI. label Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky test Intermittent failures on CI.
Projects
None yet
Development

No branches or pull requests

1 participant