Allow P2P to store data in-memory #8279

hendrikmakait · 2023-10-18T06:46:39Z

Supersedes [RFC]: Allow in-memory buffering of p2p shuffle results #7618
~~Includes and blocked by Allow ResourceLimiter to be unlimited #8276~~

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2023-10-18T07:54:13Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      27 files +      1       27 suites +1 15h 49m 14s ⏱️ + 1h 8m 8s
  3 932 tests +    56   3 811 ✔️ +    58   115 💤 ±  0   6 ❌ - 2
49 387 runs +2 020 47 017 ✔️ +1 956 2 340 💤 +62 30 ❌ +2

For more details on these failures, see this check.

Results for commit 255e477. ± Comparison against base commit a8e5dab.

This pull request removes 51 and adds 107 tests. Note that renamed tests count towards both.

distributed.shuffle.tests.test_merge ‑ test_merge[all-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[all-left]
distributed.shuffle.tests.test_merge ‑ test_merge[all-outer]
distributed.shuffle.tests.test_merge ‑ test_merge[all-right]
distributed.shuffle.tests.test_merge ‑ test_merge[none-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[none-left]
distributed.shuffle.tests.test_merge ‑ test_merge[none-outer]
distributed.shuffle.tests.test_merge ‑ test_merge[none-right]
distributed.shuffle.tests.test_merge ‑ test_merge[some-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[some-left]
…

distributed.shuffle.tests.test_memory_buffer ‑ test_basic
distributed.shuffle.tests.test_memory_buffer ‑ test_many[1000]
distributed.shuffle.tests.test_memory_buffer ‑ test_many[100]
distributed.shuffle.tests.test_memory_buffer ‑ test_many[2]
distributed.shuffle.tests.test_memory_buffer ‑ test_read_before_flush
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-left]
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-outer]
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-right]
distributed.shuffle.tests.test_merge ‑ test_merge[all-True-inner]
…

♻️ This comment has been updated with latest results.

hendrikmakait · 2023-10-18T09:57:36Z

distributed/shuffle/_memory.py

+from distributed.utils import log_errors
+
+
+class MemoryShardsBuffer(ShardsBuffer):


This can be refactored to avoid most of the baggage it inherits from ShardsBuffer. I'd do this in a follow-up since I expect us to rework buffers anyhow and don't want to spend time on API compatibility with the DiskShardsBuffer until then.

hendrikmakait · 2023-10-18T13:31:43Z

This is ready for review; I'm running some A/B tests to see the impact.

hendrikmakait · 2023-10-19T07:09:00Z

A/B test results on modified tests:

Some are up to 40% faster.
Some are unaffected.
Some failed to run because the workers ran OOM.

mrocklin · 2023-10-19T11:23:30Z

Some failed to run because the workers ran OOM

This seems bad to me

I've been playing with TPC-H recently and have been really appreciating how Dask is able to keep running even when other systems break down because we didn't have as much memory as we thought we did.

I'd encourage you to generate scale-100 on your personal computer and then run local TPC-H and see what happens with this PR. My guess is that it isn't immediately obvious how to do this. I should write up instructions probably.

hendrikmakait · 2023-10-19T14:57:06Z

This seems bad to me

This is the first iteration of diskless shuffling. It is explicitly "opt-in" by requiring you to set a config option. Falling back to disk in case a worker unexpectedly runs out of memory has always been considered out of scope for this iteration.

mrocklin · 2023-10-19T14:58:36Z

Ah, sorry. Thanks for the clarification.

…

On Thu, Oct 19, 2023 at 9:57 AM Hendrik Makait ***@***.***> wrote: This seems bad to me This is the first iteration of diskless shuffling. It is explicitly "opt-in" by requiring you to set a config option. Falling back to disk in case a worker unexpectedly runs out of memory has always been considered out of scope for this iteration. — Reply to this email directly, view it on GitHub <#8279 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTD6IBI2WW2TRFHHGYTYAE5U5AVCNFSM6AAAAAA6E7IYX6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZRGE3DINBRGM> . You are receiving this because you commented.Message ID: ***@***.***>

hendrikmakait added 4 commits October 16, 2023 19:53

Unlimited limiter

d984074

Make limiter required

433b82a

Typing

2f32a23

Diskless P2P

acd4224

move

1c50381

hendrikmakait commented Oct 18, 2023

View reviewed changes

hendrikmakait requested review from crusaderky and removed request for crusaderky October 18, 2023 10:49

Merge branch 'main' into diskless-p2p

56a94a8

hendrikmakait marked this pull request as ready for review October 18, 2023 13:30

hendrikmakait requested a review from fjetter as a code owner October 18, 2023 13:30

fjetter approved these changes Oct 18, 2023

View reviewed changes

Fix tests and typing

01c4acb

Tests

255e477

hendrikmakait merged commit cbc3a33 into dask:main Oct 19, 2023

hendrikmakait deleted the diskless-p2p branch October 19, 2023 14:59

hendrikmakait mentioned this pull request Oct 23, 2023

Zero-copy array shuffle #8282

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow P2P to store data in-memory #8279

Allow P2P to store data in-memory #8279

hendrikmakait commented Oct 18, 2023 •

edited

Loading

github-actions bot commented Oct 18, 2023 •

edited

Loading

hendrikmakait Oct 18, 2023

hendrikmakait commented Oct 18, 2023

hendrikmakait commented Oct 19, 2023

mrocklin commented Oct 19, 2023

hendrikmakait commented Oct 19, 2023

mrocklin commented Oct 19, 2023 via email

		from distributed.utils import log_errors


		class MemoryShardsBuffer(ShardsBuffer):

Allow P2P to store data in-memory #8279

Allow P2P to store data in-memory #8279

Conversation

hendrikmakait commented Oct 18, 2023 • edited Loading

github-actions bot commented Oct 18, 2023 • edited Loading

Unit Test Results

hendrikmakait Oct 18, 2023

Choose a reason for hiding this comment

hendrikmakait commented Oct 18, 2023

hendrikmakait commented Oct 19, 2023

mrocklin commented Oct 19, 2023

hendrikmakait commented Oct 19, 2023

mrocklin commented Oct 19, 2023 via email

hendrikmakait commented Oct 18, 2023 •

edited

Loading

github-actions bot commented Oct 18, 2023 •

edited

Loading