-
-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial rechunks within P2P #8330
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 11h 57m 44s ⏱️ + 21m 4s For more details on these failures, see this check. Results for commit 1f554ef. ± Comparison against base commit 9273186. This pull request removes 5 and adds 7 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
There's some stuff we still calculate for the individual n-dimensional partials (scaling as a product) but that could be computed for the partials per axis (scaling as a sum). I'll take a another look at some quick refactoring here but this might very well be premature optimization. |
@crusaderky: Looking at my old benchmark results, you are in fact reproducing them. However, |
Searching through the commit history, this is likely because #8207 has been merged, which this branch also included at the time. |
The side-by-side dashboard makes it pretty obvious why there's such a dramatic slowdown.
peek.mp4 |
Quick summary of an offline discussion I had with @crusaderky: In the example above, each partial rechunk has five output tasks (180 inputs / 36 barriers). As a result, the deterministic worker assignment logic will always pick the same five workers to store those outputs which underutilizes the cluster. We should be able to assess the impact of better worker assignment by randomizing it. |
Adding a naive randomization of the workers (d9116fb) fixes most problems:
|
Regarding the regression of |
Waiting for an alternative suggestion to the worker randomization from @fjetter, as discussed offline. |
I think what @fjetter and I both have in mind would look like this: hendrikmakait#14 FWIW, I'd not block on that, a follow-up with another benchmark run is trivial. |
Running a final A/B test. Results out tomorrow morning. |
essentially, yes. I would've not shifted by one but by |
I'm confused, wouldn't shifting by The problem the shift solves is that static range partitioning if |
Note that we could potentially investigate switching assignment to round-robin and then shift by the |
🥳 |
Closes #8326
Blocked by #8207
This PR currently splits on entirely independent subset of the rechunking. A more intricate (yet better) implementation would create independent shuffles even for cases where an input would go into more than one shuffle.This PR splits any outputs that allows us to cull more inputs, i.e., two outputs along an axis are separated if they do not end in the same input chunk.
pre-commit run --all-files