Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow P2P to store data in-memory #8279

Merged
merged 8 commits into from
Oct 19, 2023
Merged

Conversation

hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Oct 18, 2023

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions
Copy link
Contributor

github-actions bot commented Oct 18, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       27 files  +       1         27 suites  +1   15h 49m 14s ⏱️ + 1h 8m 8s
  3 932 tests +     56    3 811 ✔️ +     58     115 💤 ±  0    6  - 2 
49 387 runs  +2 020  47 017 ✔️ +1 956  2 340 💤 +62  30 +2 

For more details on these failures, see this check.

Results for commit 255e477. ± Comparison against base commit a8e5dab.

This pull request removes 51 and adds 107 tests. Note that renamed tests count towards both.
distributed.shuffle.tests.test_merge ‑ test_merge[all-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[all-left]
distributed.shuffle.tests.test_merge ‑ test_merge[all-outer]
distributed.shuffle.tests.test_merge ‑ test_merge[all-right]
distributed.shuffle.tests.test_merge ‑ test_merge[none-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[none-left]
distributed.shuffle.tests.test_merge ‑ test_merge[none-outer]
distributed.shuffle.tests.test_merge ‑ test_merge[none-right]
distributed.shuffle.tests.test_merge ‑ test_merge[some-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[some-left]
…
distributed.shuffle.tests.test_memory_buffer ‑ test_basic
distributed.shuffle.tests.test_memory_buffer ‑ test_many[1000]
distributed.shuffle.tests.test_memory_buffer ‑ test_many[100]
distributed.shuffle.tests.test_memory_buffer ‑ test_many[2]
distributed.shuffle.tests.test_memory_buffer ‑ test_read_before_flush
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-inner]
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-left]
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-outer]
distributed.shuffle.tests.test_merge ‑ test_merge[all-False-right]
distributed.shuffle.tests.test_merge ‑ test_merge[all-True-inner]
…

♻️ This comment has been updated with latest results.

from distributed.utils import log_errors


class MemoryShardsBuffer(ShardsBuffer):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be refactored to avoid most of the baggage it inherits from ShardsBuffer. I'd do this in a follow-up since I expect us to rework buffers anyhow and don't want to spend time on API compatibility with the DiskShardsBuffer until then.

@hendrikmakait hendrikmakait requested review from crusaderky and removed request for crusaderky October 18, 2023 10:49
@hendrikmakait hendrikmakait marked this pull request as ready for review October 18, 2023 13:30
@hendrikmakait
Copy link
Member Author

This is ready for review; I'm running some A/B tests to see the impact.

@hendrikmakait
Copy link
Member Author

A/B test results on modified tests:

Screenshot 2023-10-19 at 08 46 44
  • Some are up to 40% faster.
  • Some are unaffected.
  • Some failed to run because the workers ran OOM.

@hendrikmakait hendrikmakait merged commit cbc3a33 into dask:main Oct 19, 2023
@mrocklin
Copy link
Member

Some failed to run because the workers ran OOM

This seems bad to me

I've been playing with TPC-H recently and have been really appreciating how Dask is able to keep running even when other systems break down because we didn't have as much memory as we thought we did.

I'd encourage you to generate scale-100 on your personal computer and then run local TPC-H and see what happens with this PR. My guess is that it isn't immediately obvious how to do this. I should write up instructions probably.

@hendrikmakait
Copy link
Member Author

This seems bad to me

This is the first iteration of diskless shuffling. It is explicitly "opt-in" by requiring you to set a config option. Falling back to disk in case a worker unexpectedly runs out of memory has always been considered out of scope for this iteration.

@mrocklin
Copy link
Member

mrocklin commented Oct 19, 2023 via email

@hendrikmakait hendrikmakait deleted the diskless-p2p branch October 19, 2023 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants