Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr-Python 3 compatibility #11388

Merged
merged 20 commits into from
Oct 11, 2024
Merged

Conversation

jhamman
Copy link
Member

@jhamman jhamman commented Sep 14, 2024

Copy link
Contributor

github-actions bot commented Sep 14, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     13 files  ±0       13 suites  ±0   3h 14m 49s ⏱️ + 8m 37s
 13 215 tests ±0   12 159 ✅ ±0   1 056 💤 ±0  0 ❌ ±0 
137 835 runs  ±0  118 821 ✅ ±0  19 014 💤 ±0  0 ❌ ±0 

Results for commit c1c7f5a. ± Comparison against base commit 295c8de.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.
dask.array.tests.test_array_core ‑ test_zarr_pass_mapper
dask.array.tests.test_array_core ‑ test_zarr_pass_store

♻️ This comment has been updated with latest results.

@jhamman
Copy link
Member Author

jhamman commented Sep 16, 2024

We have a few more things to fix in Zarr before merging this. I'll circle back once this is ready for a review.

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jhamman. Exciting stuff. Let me know if you need anything while working through the compatibility fixes

dask/array/core.py Outdated Show resolved Hide resolved
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to bump CI here as it looks like there's been some relevant changes over in zarr-developers/zarr-python#2186 and I'm curious to see what tests looks like here now

@jhamman
Copy link
Member Author

jhamman commented Sep 29, 2024

@jrbourbeau -- this is just about ready to go. Any idea what is going on with the upstream test build?

@jrbourbeau
Copy link
Member

Hmm looking now...

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, looks like it might be a mamba=2-related issue. Let's see if #11409 fixes things.

@jrbourbeau
Copy link
Member

Alright, @jhamman merging main should fix CI

@jhamman
Copy link
Member Author

jhamman commented Oct 7, 2024

So close here. We're down to just one error:

ERROR dask/tests/test_distributed.py::test_zarr_distributed_roundtrip - Failed: 4 thread(s) were leaked from test

@jrbourbeau - do you have thoughts on how to debug this? This is a part of the Dask test suite I don't quite understand.

@jhamman
Copy link
Member Author

jhamman commented Oct 7, 2024

@martindurant - also pinging you here because I assume you have sorted something similar in the past w/ fsspec. Looking at the logs, I see 3-4 threads:

  • zarrIO - our global IO thread where we run our IO loop
  • asyncio_{1,2,...} - unclear where these are coming from.

@martindurant
Copy link
Member

zarr.core.common.to_thread creates a hidden ThreadPoolExecutor and stuff gets assigned to run there. If you wanted to maintain this pool, you would have to create and close it in order not to be caught by dask's thread checker.

It is used throughout LocalStore, by the way, which makes no sense.

It is also used for offloading en/decoding, which makes more sense, as these can involve considerable CPU. However, dask already has a pool running, so I would tentatively suggest that dask should take care of parallelism, and zarr should have a mode in which it never makes new threads.

There is even an argument that the ioloop should not be created in its thread since a (distributed) dask worker already has one in the main thread, but fsspec does this too. On a worker, you may well then have three asyncio loops running on different threads.

@mrocklin
Copy link
Member

mrocklin commented Oct 8, 2024

Heads-up, @jrbourbeau has been reassigned to baby-duty for the next month. He's unlikely to be responsive here. Your best bet may be @fjetter and colleagues.

@jhamman
Copy link
Member Author

jhamman commented Oct 8, 2024

zarr.core.common.to_thread creates a hidden ThreadPoolExecutor and stuff gets assigned to run there. If you wanted to maintain this pool, you would have to create and close it in order not to be caught by dask's thread checker.

Makes sense. I'm fairly confident that none of the threads here are from the ThreadPoolExecutor but I agree that we should probably be looking to manage this ourselves.

It is used throughout LocalStore, by the way, which makes no sense.

+1, I agree with this

There is even an argument that the ioloop should not be created in its thread since a (distributed) dask worker already has one in the main thread, but fsspec does this too. On a worker, you may well then have three asyncio loops running on different threads.

I'm curious there are ways to detect this and utilize an existing thread (or loop) in a general way.

Also, it just occurred to me that this test may also raise an error with fsspec but it is probably not exercised through the distributed test suite today. I think I can rig that up pretty easily...

Edit (5 minutes later): Indeed, using fsspec with zarr-python 2.18 also leaks threads:

def test_zarr_distributed_roundtrip_fsspec(c):
    pytest.importorskip("numpy")
    da = pytest.importorskip("dask.array")
    zarr  = pytest.importorskip("zarr")
    fsspec = pytest.importorskip("fsspec")

    store = fsspec.get_mapper("s3://zarr-test")

    a = da.zeros((3, 3), chunks=(1, 1))
    a.to_zarr(store)
    a2 = da.from_zarr(store)
    da.assert_eq(a, a2, scheduler=c)
    assert a2.chunks == a.chunks
dask/tests/test_distributed.py::test_zarr_distributed_roundtrip_fsspec PASSED                                                [100%]
dask/tests/test_distributed.py::test_zarr_distributed_roundtrip_fsspec ERROR                                                 [100%]

============================================================== ERRORS ==============================================================
___________________________________ ERROR at teardown of test_zarr_distributed_roundtrip_fsspec ____________________________________
2 thread(s) were leaked from test

------ Call stack of leaked thread 1/2: <Thread(fsspecIO, started daemon 6268825600)> ------
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/threading.py", line 1032, in _bootstrap
        self._bootstrap_inner()
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
        self.run()
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/threading.py", line 1012, in run
        self._target(*self._args, **self._kwargs)
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/asyncio/base_events.py", line 641, in run_forever
        self._run_once()
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/asyncio/base_events.py", line 1948, in _run_once
        event_list = self._selector.select(timeout)
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/selectors.py", line 566, in select
        kev_list = self._selector.control(None, max_ev, timeout)

------ Call stack of leaked thread 2/2: <Thread(asyncio_0, started daemon 6285651968)> ------
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/threading.py", line 1032, in _bootstrap
        self._bootstrap_inner()
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
        self.run()
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/threading.py", line 1012, in run
        self._target(*self._args, **self._kwargs)
  File "/Users/jhamman/miniforge3/envs/dask-dev/lib/python3.12/concurrent/futures/thread.py", line 89, in _worker
        work_item = work_queue.get(block=True)

My question: is this a problem we need to be concerned about? Dask+Fsspec+Zarr have been working fine for years with this extra thread. Only now, because Zarr has brought this thread/IO-loop into its core code path, have we caught it in Dask's test suite but its been around forever.

@fjetter
Copy link
Member

fjetter commented Oct 8, 2024

A leaking thread is not a huge deal. For instance, we know that the offloading thread (I think Martin mentioned this earlier) is not cleaned up. We're initializing this on import time, see https://github.com/dask/distributed/blob/36020d6abe4506af08bae2807d78477b7916c8aa/distributed/utils_test.py#L112 such that the thread is already up and running before the test runs. So, if there is a way to initialize that hidden threadpool once before the tests run, that would be the cleanest solution

@jhamman
Copy link
Member Author

jhamman commented Oct 10, 2024

@fjetter and co -- this is ready for a real review. The leaked thread issue was mostly fixed upstream, now Zarr includes a cleanup utility for its internal threads.

@jhamman jhamman changed the title [WIP] Zarr-Python 3 compatibility Zarr-Python 3 compatibility Oct 10, 2024
Copy link
Collaborator

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@fjetter fjetter merged commit 20eeeda into dask:main Oct 11, 2024
24 checks passed
@jhamman jhamman deleted the fix/zarr-array-construction-2 branch October 11, 2024 14:01
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, glad to see this got over the finish line. Thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants