Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distributed/shuffle/tests/test_rechunk.py::test_rechunk_auto_1d[20-chunks4-5-expected4] failing on main #8869

Closed
jrbourbeau opened this issue Sep 10, 2024 · 4 comments

Comments

@jrbourbeau
Copy link
Member

I noticed main is failing with this error

__________________________________________ test_rechunk_auto_1d[20-chunks4-5-expected4] __________________________________________

c = <Client: No scheduler connected>, s = <Scheduler 'tcp://127.0.0.1:61901', workers: 0, cores: 0, tasks: 0>, shape = 20
chunks = (1, 1, 1, 1, 6, 2, ...), bs = 5, expected = (5, 5, 5, 5)

    @pytest.mark.parametrize(
        "shape,chunks,bs,expected",
        [
            (100, 1, 10, (10,) * 10),
            (100, 50, 10, (10,) * 10),
            (100, 100, 10, (10,) * 10),
            (20, 7, 10, (7, 7, 6)),
            (20, (1, 1, 1, 1, 6, 2, 1, 7), 5, (5, 5, 5, 5)),
        ],
    )
    @gen_cluster(client=True)
    async def test_rechunk_auto_1d(c, s, *ws, shape, chunks, bs, expected):
        """
        See Also
        --------
        dask.array.tests.test_rechunk.test_rechunk_auto_1d
        """
        x = da.ones(shape, chunks=(chunks,))
        y = x.rechunk({0: "auto"}, block_size_limit=bs * x.dtype.itemsize, method="p2p")
>       assert y.chunks == (expected,)
E       assert ((4, 6, 3, 4, 3),) == ((5, 5, 5, 5),)
E         At index 0 diff: (4, 6, 3, 4, 3) != (5, 5, 5, 5)
E         Full diff:
E         - ((5, 5, 5, 5),)
E         + ((4, 6, 3, 4, 3),)

distributed/shuffle/tests/test_rechunk.py:905: AssertionError
------------------------------------------------------ Captured stderr call ------------------------------------------------------
2024-09-10 09:55:03,392 - distributed.scheduler - INFO - State start
2024-09-10 09:55:03,394 - distributed.scheduler - INFO -   Scheduler at:     tcp://127.0.0.1:61901
2024-09-10 09:55:03,394 - distributed.scheduler - INFO -   dashboard at:  http://127.0.0.1:61900/status
2024-09-10 09:55:03,394 - distributed.scheduler - INFO - Registering Worker plugin shuffle
2024-09-10 09:55:03,399 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:61902
2024-09-10 09:55:03,399 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:61902
2024-09-10 09:55:03,399 - distributed.worker - INFO -           Worker name:                          0
2024-09-10 09:55:03,399 - distributed.worker - INFO -          dashboard at:            127.0.0.1:61903
2024-09-10 09:55:03,399 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:61901
2024-09-10 09:55:03,399 - distributed.worker - INFO - -------------------------------------------------
2024-09-10 09:55:03,399 - distributed.worker - INFO -               Threads:                          1
2024-09-10 09:55:03,399 - distributed.worker - INFO -                Memory:                  16.00 GiB
2024-09-10 09:55:03,399 - distributed.worker - INFO -       Local Directory: /var/folders/h0/_w6tz8jd3b9bk6w7d_xpg9t40000gn/T/dask-scratch-space/worker-s_vcngzy
2024-09-10 09:55:03,399 - distributed.worker - INFO - -------------------------------------------------
2024-09-10 09:55:03,401 - distributed.worker - INFO -       Start worker at:      tcp://127.0.0.1:61905
2024-09-10 09:55:03,401 - distributed.worker - INFO -          Listening to:      tcp://127.0.0.1:61905
2024-09-10 09:55:03,401 - distributed.worker - INFO -           Worker name:                          1
2024-09-10 09:55:03,401 - distributed.worker - INFO -          dashboard at:            127.0.0.1:61906
2024-09-10 09:55:03,401 - distributed.worker - INFO - Waiting to connect to:      tcp://127.0.0.1:61901
2024-09-10 09:55:03,401 - distributed.worker - INFO - -------------------------------------------------
2024-09-10 09:55:03,401 - distributed.worker - INFO -               Threads:                          2
2024-09-10 09:55:03,401 - distributed.worker - INFO -                Memory:                  16.00 GiB
2024-09-10 09:55:03,401 - distributed.worker - INFO -       Local Directory: /var/folders/h0/_w6tz8jd3b9bk6w7d_xpg9t40000gn/T/dask-scratch-space/worker-fmmbyb21
2024-09-10 09:55:03,401 - distributed.worker - INFO - -------------------------------------------------
2024-09-10 09:55:03,402 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:61902', name: 0, status: init, memory: 0, processing: 0>
2024-09-10 09:55:03,402 - distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:61902
2024-09-10 09:55:03,402 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:61904
2024-09-10 09:55:03,402 - distributed.scheduler - INFO - Register worker <WorkerState 'tcp://127.0.0.1:61905', name: 1, status: init, memory: 0, processing: 0>
2024-09-10 09:55:03,402 - distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:61905
2024-09-10 09:55:03,402 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:61907
2024-09-10 09:55:03,403 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-09-10 09:55:03,403 - distributed.worker - INFO - Starting Worker plugin shuffle
2024-09-10 09:55:03,403 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:61901
2024-09-10 09:55:03,403 - distributed.worker - INFO - -------------------------------------------------
2024-09-10 09:55:03,403 - distributed.worker - INFO -         Registered to:      tcp://127.0.0.1:61901
2024-09-10 09:55:03,403 - distributed.worker - INFO - -------------------------------------------------
2024-09-10 09:55:03,403 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:61901
2024-09-10 09:55:03,403 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:61901
2024-09-10 09:55:03,405 - distributed.scheduler - INFO - Receive client connection: Client-a9085b54-6f84-11ef-929e-02471348e20c
2024-09-10 09:55:03,405 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:61908
2024-09-10 09:55:03,417 - distributed.scheduler - INFO - Remove client Client-a9085b54-6f84-11ef-929e-02471348e20c
2024-09-10 09:55:03,418 - distributed.core - INFO - Received 'close-stream' from tcp://127.0.0.1:61908; closing.
2024-09-10 09:55:03,418 - distributed.scheduler - INFO - Remove client Client-a9085b54-6f84-11ef-929e-02471348e20c
2024-09-10 09:55:03,418 - distributed.scheduler - INFO - Close client connection: Client-a9085b54-6f84-11ef-929e-02471348e20c
2024-09-10 09:55:03,418 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:61902. Reason: worker-close
2024-09-10 09:55:03,418 - distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:61905. Reason: worker-close
2024-09-10 09:55:03,418 - distributed.worker - INFO - Removing Worker plugin shuffle
2024-09-10 09:55:03,418 - distributed.worker - INFO - Removing Worker plugin shuffle
2024-09-10 09:55:03,419 - distributed.core - INFO - Connection to tcp://127.0.0.1:61901 has been closed.
2024-09-10 09:55:03,419 - distributed.core - INFO - Connection to tcp://127.0.0.1:61901 has been closed.
2024-09-10 09:55:03,420 - distributed.core - INFO - Received 'close-stream' from tcp://127.0.0.1:61904; closing.
2024-09-10 09:55:03,420 - distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:61902', name: 0, status: closing, memory: 0, processing: 0> (stimulus_id='handle-worker-cleanup-1725980103.420172')
2024-09-10 09:55:03,420 - distributed.core - INFO - Received 'close-stream' from tcp://127.0.0.1:61907; closing.
2024-09-10 09:55:03,422 - distributed.scheduler - INFO - Remove worker <WorkerState 'tcp://127.0.0.1:61905', name: 1, status: closing, memory: 0, processing: 0> (stimulus_id='handle-worker-cleanup-1725980103.421977')
2024-09-10 09:55:03,422 - distributed.scheduler - INFO - Lost all workers
2024-09-10 09:55:03,422 - distributed.batched - INFO - Batched Comm Closed <TCP (closed) Scheduler connection to worker local=tcp://127.0.0.1:61901 remote=tcp://127.0.0.1:61907>
Traceback (most recent call last):
  File "/Users/james/projects/dask/distributed/distributed/batched.py", line 115, in _background_send
    nbytes = yield coro
             ^^^^^^^^^^
  File "/Users/james/mambaforge/envs/distributed/lib/python3.11/site-packages/tornado/gen.py", line 767, in run
    value = future.result()
            ^^^^^^^^^^^^^^^
  File "/Users/james/projects/dask/distributed/distributed/comm/tcp.py", line 262, in write
    raise CommClosedError()
distributed.comm.core.CommClosedError
2024-09-10 09:55:03,423 - distributed.scheduler - INFO - Closing scheduler. Reason: unknown
2024-09-10 09:55:03,423 - distributed.scheduler - INFO - Scheduler closing all comms

cc @phofl @hendrikmakait for visibility

@jrbourbeau
Copy link
Member Author

Looks like dask/dask#11354 is the relevant change

@jrbourbeau
Copy link
Member Author

@phofl thoughts on if we should just update the chunks in the distributed test?

@jacobtomlinson
Copy link
Member

@jrbourbeau can we close this now that dask/dask#11385 is in?

@phofl
Copy link
Collaborator

phofl commented Oct 16, 2024

yeah we can close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants