Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiprocessing support #2815

Merged
merged 6 commits into from
Feb 11, 2025
Merged

Conversation

d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Feb 11, 2025

I wanted to use this PR to add tests to ensure that basic array operations work in a multiprocessing context. But right now I just have 1 test, and it's failing with an infinite stall. Since this breaks the entire test suite, we should also fix the underlying bug in this PR. Stay tuned!

closes #2812

@d-v-b d-v-b added bug Potential issues with the zarr-python library help wanted Issue could use help from someone with familiarity on the topic labels Feb 11, 2025
@github-actions github-actions bot added needs release notes Automatically applied to PRs which haven't added release notes and removed bug Potential issues with the zarr-python library help wanted Issue could use help from someone with familiarity on the topic labels Feb 11, 2025
@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 11, 2025

the underlying issue here is that store operations don't work in child processes.

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 11, 2025

I added some cleanup logic tied to os.register_at_fork that sets the persistent loops to None in child processes. No clue if this is the recommended approach, but it made the test pass. I also parametrized the multiprocessing test over different subprocess creation methods -- spawn, fork, and forkserver, with platform restrictions since not all OS's support everything.

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 11, 2025

@martindurant this seems like your wheelhouse -- would you mind checking this over?

@martindurant
Copy link
Member

sets the persistent loops to None in child processes

This is absolutely what you must do.
asyncio event loops, threads and file handles are the main category of things that "don't work" in a child process, with the specific problem depending on which thing it is and how the process is made.

@d-v-b d-v-b marked this pull request as ready for review February 11, 2025 15:41
@d-v-b d-v-b requested a review from martindurant February 11, 2025 15:41
@martindurant
Copy link
Member

(I should have added locks to the list of things to watch out for)

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 11, 2025

I think this PR is ready for review. The test I added checks that array indexing works in a multiprocessing context, which is a fine test, but the actual substance is whether our asynchronous store methods work with multiprocessing, and that can be tested without arrays. But I'd rather treat that as a test refactor for the future.

Copy link
Member

@martindurant martindurant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK keeping these, effectively, as integration tests rather than store-specific. The top-level sync() we call internally is important here.

Ensure that global resources are reset after a fork. Without this function,
forked processes will retain invalid references to the parent process's resources.
"""
global loop, iothread, _executor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strictly speaking, loop and iothread don't need to be global, since they are mutated in-place.

src/zarr/core/sync.py Outdated Show resolved Hide resolved
pytest.param(
"fork",
marks=pytest.mark.skipif(
sys.platform in ("win32", "darwin"), reason="fork not supported on Windows or OSX"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because windows should only perform spawn? This decorator is a bit verbose, it would be OK to put if ... : pytest.skip() in the body of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer (verbose) parametrization over checking the OS in the test itself. The former gives better feedback in the test summary.

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 11, 2025

maybe someone can explain what codecov is going on about? I find this hard to interpret:

image

I'm going to merge this later today unless there are objections.

@martindurant
Copy link
Member

The "indirect changes" tab is picking up many lines in testing/stateful.py . I don't know why they show up here - maybe this file wasn't being tracked at all before?

@martindurant
Copy link
Member

OK, it looks like not all of the coverage reports were in yet :)
Good now.

@d-v-b d-v-b merged commit 2f8b88a into zarr-developers:main Feb 11, 2025
30 checks passed
@d-v-b d-v-b deleted the chore/multiprocessing-tests branch February 11, 2025 16:17
@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 11, 2025

thanks for your help martin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multiprocessing support
2 participants