-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-14484] Improve behavior surrounding primary roots in self-checkpointing #17716
Conversation
Can one of the admins verify this patch? |
3 similar comments
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
Codecov Report
@@ Coverage Diff @@
## master #17716 +/- ##
==========================================
- Coverage 73.99% 73.99% -0.01%
==========================================
Files 695 696 +1
Lines 91798 91851 +53
==========================================
+ Hits 67926 67962 +36
- Misses 22624 22640 +16
- Partials 1248 1249 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Run Go Flink ValidatesRunner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just had a few comments, but they're mostly cosmetic or helping me understand how things work (mostly that tbh)
Assigning reviewers. If you would like to opt out of this review, comment R: @lostluck for label go. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
The Flink breakage is surprisingly tied to PR #17681 as the TestStream tests are passing an unwindowed, unbounded side input into our passert functions that make them side inputs. Will be looking into fixing that next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but otherwise LGTM. I'll do another pass after lunch.
Run Go Flink ValidatesRunner |
The self-checkpointing test passes, we're good on that front |
size, ok := root.Elm2.(float64) | ||
if !ok { | ||
log.Warnf(context.Background(), "expected size to be type float64, got type %T", root.Elm2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Warnf(context.Background(), "expected size to be type float64, got type %T", root.Elm2) | |
log.Warnf(context.Background(), "expected restriction size to be type float64, got type %T", root.Elm2) |
@@ -385,11 +396,16 @@ func (n *DataSource) Checkpoint() (SplitResult, time.Duration, bool, error) { | |||
if err != nil { | |||
return SplitResult{}, -1 * time.Minute, false, err | |||
} | |||
if len(rs) == 0 { | |||
return SplitResult{}, -1 * time.Minute, false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to wrap my head around when we would ever expect this case - I have 2 related questions:
- If the user has checkpointed but then returns an empty residual, they shouldn't have checkpointed, right? I'd expect us to at least warn in that case probably.
- Even if there are no residuals, don't we still want to validate that they haven't set any primaries? That's still an error waiting to happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A no-residual return is indicative of a no-op split, which can happen. In a checkpointing context we wouldn't necessarily expect it but it adds some protection if a user schedules a bundle to resume that didn't have any work left.
Co-authored-by: Danny McCormick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more nits and cleanups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Jack!
Improves the error message thrown if primary roots are returned to direct users to have nil returns for primary splits in the self-checkpointing case. Also adds a check for bounded, size 0 restrictions as an alternative acceptable primary return as they should represent no work being done. These behaviors are now also outlined in the doc-string for the RTracker interface.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.