-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise and avoid data loss of meta provided to P2P shuffle is wrong #8520
Conversation
ddf = dd.from_delayed( | ||
[data_gen()] * 2, meta=[("a", int), ("b", int)], verify_meta=False | ||
) | ||
|
||
with raises_with_cause( | ||
RuntimeError, | ||
r"shuffling \w* failed", | ||
ValueError, | ||
"meta", | ||
): | ||
await c.gather(c.compute(ddf.shuffle(on="a"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On main this indeed causes data loss and is returning an empty frame. While the from_delayed
path above might be slightly artificial I don't think it's that difficult to produce a slight meta drift in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_csv is an example where this could happen in real world use cases
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 9h 54m 19s ⏱️ - 3m 46s For more details on these failures, see this check. Results for commit 67eb545. ± Comparison against base commit a5a6e99. |
Closes #8519
The
KeyError
was used for control flow but I believe one should use dedicated exception types for this use case and keep the std exceptions clear for error reporting. This logic swallowed a very obvious error.I'll see if I can reproduce #8519 even with artificial input somehow. Even if I don't have a reproducer I believe this change is sensible.