Partial rechunks within P2P #8330

hendrikmakait · 2023-11-07T13:09:16Z

Closes #8326
Blocked by #8207

This PR currently splits on entirely independent subset of the rechunking. A more intricate (yet better) implementation would create independent shuffles even for cases where an input would go into more than one shuffle.

This PR splits any outputs that allows us to cull more inputs, i.e., two outputs along an axis are separated if they do not end in the same input chunk.

Tests added / passed
Passes pre-commit run --all-files

github-actions · 2023-11-07T14:11:16Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      27 files ±  0       27 suites ±0 11h 57m 44s ⏱️ + 21m 4s
  3 950 tests +  2   3 838 ✔️ +  4   109 💤 -   1 3 ❌ - 1
49 683 runs +42 47 392 ✔️ +64 2 288 💤 - 19 3 ❌ - 3

For more details on these failures, see this check.

Results for commit 1f554ef. ± Comparison against base commit 9273186.

This pull request removes 5 and adds 7 tests. Note that renamed tests count towards both.

distributed.shuffle.tests.test_rechunk ‑ test_rechunk_with_single_output_chunk_raises
distributed.shuffle.tests.test_rechunk ‑ test_worker_for_homogeneous_distribution[1]
distributed.shuffle.tests.test_rechunk ‑ test_worker_for_homogeneous_distribution[2]
distributed.shuffle.tests.test_rechunk ‑ test_worker_for_homogeneous_distribution[41]
distributed.shuffle.tests.test_rechunk ‑ test_worker_for_homogeneous_distribution[50]

distributed.shuffle.tests.test_rechunk ‑ test_cull_p2p_rechunk_independent_partitions
distributed.shuffle.tests.test_rechunk ‑ test_cull_p2p_rechunk_overlapping_partitions
distributed.shuffle.tests.test_rechunk ‑ test_partial_rechunk_homogeneous_distribution
distributed.shuffle.tests.test_rechunk ‑ test_pick_worker_homogeneous_distribution[1]
distributed.shuffle.tests.test_rechunk ‑ test_pick_worker_homogeneous_distribution[2]
distributed.shuffle.tests.test_rechunk ‑ test_pick_worker_homogeneous_distribution[41]
distributed.shuffle.tests.test_rechunk ‑ test_pick_worker_homogeneous_distribution[50]

♻️ This comment has been updated with latest results.

hendrikmakait · 2023-11-28T10:15:17Z

There's some stuff we still calculate for the individual n-dimensional partials (scaling as a product) but that could be computed for the partials per axis (scaling as a sum). I'll take a another look at some quick refactoring here but this might very well be premature optimization.

distributed/shuffle/_rechunk.py

hendrikmakait · 2023-12-18T14:31:27Z

@crusaderky: Looking at my old benchmark results, you are in fact reproducing them. However, test_tiles_to_rows has significantly improved in performance since then:

hendrikmakait · 2023-12-18T14:33:35Z

Searching through the commit history, this is likely because #8207 has been merged, which this branch also included at the time.

crusaderky · 2023-12-18T15:29:49Z

The side-by-side dashboard makes it pretty obvious why there's such a dramatic slowdown.

test_adjacent_groups[0.3-128 MiB-p2p-disk], 180 input/output chunks.
This PR picks even workers only for the output chunks! (worker 0 is always at the bottom)

peek.mp4

hendrikmakait · 2023-12-18T15:43:47Z

Quick summary of an offline discussion I had with @crusaderky: In the example above, each partial rechunk has five output tasks (180 inputs / 36 barriers). As a result, the deterministic worker assignment logic will always pick the same five workers to store those outputs which underutilizes the cluster. We should be able to assess the impact of better worker assignment by randomizing it.

crusaderky · 2023-12-19T11:43:24Z

Adding a naive randomization of the workers (d9116fb) fixes most problems:

there are runtime regressions in some tests with 8 MiB chunks. I'm oriented towards accepting them for the greater good.
there is a runtime regression in test_swap_axes[1-128 MiB-p2p-disk]. It makes no sense to me as it should not be impacted by this PR. Investigating.
the peak memory regressions are false positives, please ignore: A/B plot is misleading for p2p-disk memory usage and tick measures coiled/benchmarks#1240
worker randomization breaks TPCH query 7: p2p worker selection is brittle for TPCH query 7 #8424

hendrikmakait · 2023-12-19T12:08:05Z

Regarding the regression of test_swap_axes, I've seen that happen before when running benchmarks for this PR. In the past, they disappeared if I test that one in isolation. I'm not sure what's going on exactly.

crusaderky · 2023-12-19T15:44:29Z

Regarding the regression of test_swap_axes, I've seen that happen before when running benchmarks for this PR. In the past, they disappeared if I test that one in isolation. I'm not sure what's going on exactly.

Reran it in isolation. You're right, these results don't make sense for me

crusaderky · 2023-12-19T15:45:23Z

Waiting for an alternative suggestion to the worker randomization from @fjetter, as discussed offline.
This PR is otherwise green to go for me.

hendrikmakait · 2023-12-19T16:08:31Z

I think what @fjetter and I both have in mind would look like this: hendrikmakait#14

FWIW, I'd not block on that, a follow-up with another benchmark run is trivial.

crusaderky · 2023-12-19T23:10:56Z

Running a final A/B test. Results out tomorrow morning.

crusaderky · 2023-12-20T00:17:45Z

Looks good. Comment is the same as in the previous posts.

fjetter · 2023-12-20T10:05:12Z

I think what @fjetter and I both have in mind would look like this: hendrikmakait#14

essentially, yes. I would've not shifted by one but by len(workers) but the gist is the same. I think shifting by len(workers) generates more homogeneous distributions for small numbers of shuffles

hendrikmakait · 2023-12-20T10:08:58Z

essentially, yes. I would've not shifted by one but by len(workers) but the gist is the same. I think shifting by len(workers) generates more homogeneous distributions for small numbers of shuffles

I'm confused, wouldn't shifting by len(workers) have no effect whatsoever in a cluster without fluctuating workers?

The problem the shift solves is that static range partitioning if len(partitions) < len(workers) skips some workers (e.g., only picks every other worker if len(partitions) / len(workers) = 0.5). By shifting by one, this should be as homogeneous as possible for any range-partitioned pattern as long as len(partitions) remains constant. For non-constant len(partitions), I can't come up with anything else that's better unless we start counting assignments per worker.

hendrikmakait · 2023-12-20T10:14:40Z

Note that we could potentially investigate switching assignment to round-robin and then shift by the len(partitions) % len(workers). IIRC, the lack of buffering in our disk buffers currently prohibits that though.

crusaderky · 2023-12-20T14:44:48Z

🥳

hendrikmakait added 3 commits November 7, 2023 14:08

Initial attempt

73d1186

Test

f0605ec

Fix

5b2db9c

hendrikmakait added 8 commits November 7, 2023 17:24

Merge branch 'main' into local-rechunks

b452184

Merge branch 'main' into local-rechunks

4434014

Enable single outputs

8341abc

Rename

9f11fa9

Merge branch 'main' into local-rechunks

9ea82f3

[skip-caching]

d686196

Fix CI

ad0ec4e

Add tests

d3c83e0

hendrikmakait marked this pull request as ready for review November 24, 2023 12:21

hendrikmakait requested a review from fjetter as a code owner November 24, 2023 12:21

hendrikmakait marked this pull request as draft November 24, 2023 12:50

hendrikmakait and others added 9 commits November 27, 2023 18:37

Simplify

69de3cc

Extract fns

891ad99

Docstring

7a06d1e

refactor

a03c406

Greatly simplify logic

19ddc79

Merge branch 'main' into local-rechunks

f1d5949

Revert kwargs

33d7ce2

TODOs

4b5889c

Minor refactoring

2d7bf4d

hendrikmakait marked this pull request as ready for review November 28, 2023 09:55

hendrikmakait changed the title ~~[WIP] Partial rechunks within P2P~~ Partial rechunks within P2P Nov 28, 2023

hendrikmakait added 2 commits November 28, 2023 15:44

Reduce diff

f97a689

Include dask#8207

c78b3b7

hendrikmakait commented Nov 29, 2023

View reviewed changes

distributed/shuffle/_rechunk.py Outdated Show resolved Hide resolved

crusaderky added a commit to crusaderky/distributed that referenced this pull request Dec 18, 2023

Partial rechunks (dask#8330)

760dae7

crusaderky added 2 commits December 18, 2023 22:37

Randomize workers in _calculate_worker_for

d9116fb

Merge branch 'main' into local-rechunks

163890a

crusaderky mentioned this pull request Dec 19, 2023

p2p worker selection is brittle for TPCH query 7 #8424

Closed

Merge branch 'main' into local-rechunks

961dbd6

hendrikmakait marked this pull request as ready for review December 19, 2023 16:05

Alternative

7918aaf

crusaderky added 2 commits December 19, 2023 23:00

Merge branch 'alternative-shift' into local-rechunks

38d5ca4

Test is now deterministic

fec5936

Merge branch 'main' into local-rechunks

36a8a5b

Add comment

1f554ef

crusaderky approved these changes Dec 20, 2023

View reviewed changes

hendrikmakait merged commit e6c7d66 into dask:main Dec 20, 2023
28 of 34 checks passed

hendrikmakait deleted the local-rechunks branch December 20, 2023 14:38

hendrikmakait mentioned this pull request Jan 12, 2024

Release 2024.1.0 dask/community#360

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial rechunks within P2P #8330

Partial rechunks within P2P #8330

hendrikmakait commented Nov 7, 2023 •

edited

Loading

github-actions bot commented Nov 7, 2023 •

edited

Loading

hendrikmakait commented Nov 28, 2023

hendrikmakait commented Dec 18, 2023

hendrikmakait commented Dec 18, 2023

crusaderky commented Dec 18, 2023 •

edited

Loading

hendrikmakait commented Dec 18, 2023

crusaderky commented Dec 19, 2023

hendrikmakait commented Dec 19, 2023

crusaderky commented Dec 19, 2023

crusaderky commented Dec 19, 2023

hendrikmakait commented Dec 19, 2023

crusaderky commented Dec 19, 2023

crusaderky commented Dec 20, 2023

fjetter commented Dec 20, 2023

hendrikmakait commented Dec 20, 2023 •

edited

Loading

hendrikmakait commented Dec 20, 2023 •

edited

Loading

crusaderky commented Dec 20, 2023

Partial rechunks within P2P #8330

Partial rechunks within P2P #8330

Conversation

hendrikmakait commented Nov 7, 2023 • edited Loading

github-actions bot commented Nov 7, 2023 • edited Loading

Unit Test Results

hendrikmakait commented Nov 28, 2023

hendrikmakait commented Dec 18, 2023

hendrikmakait commented Dec 18, 2023

crusaderky commented Dec 18, 2023 • edited Loading

hendrikmakait commented Dec 18, 2023

crusaderky commented Dec 19, 2023

hendrikmakait commented Dec 19, 2023

crusaderky commented Dec 19, 2023

crusaderky commented Dec 19, 2023

hendrikmakait commented Dec 19, 2023

crusaderky commented Dec 19, 2023

crusaderky commented Dec 20, 2023

fjetter commented Dec 20, 2023

hendrikmakait commented Dec 20, 2023 • edited Loading

hendrikmakait commented Dec 20, 2023 • edited Loading

crusaderky commented Dec 20, 2023

hendrikmakait commented Nov 7, 2023 •

edited

Loading

github-actions bot commented Nov 7, 2023 •

edited

Loading

crusaderky commented Dec 18, 2023 •

edited

Loading

hendrikmakait commented Dec 20, 2023 •

edited

Loading

hendrikmakait commented Dec 20, 2023 •

edited

Loading