Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-14484] Improve behavior surrounding primary roots in self-checkpointing #17716

Merged
merged 16 commits into from
May 19, 2022

Conversation

jrmccluskey
Copy link
Contributor

Improves the error message thrown if primary roots are returned to direct users to have nil returns for primary splits in the self-checkpointing case. Also adds a check for bounded, size 0 restrictions as an alternative acceptable primary return as they should represent no work being done. These behaviors are now also outlined in the doc-string for the RTracker interface.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@asf-ci
Copy link

asf-ci commented May 19, 2022

Can one of the admins verify this patch?

3 similar comments
@asf-ci
Copy link

asf-ci commented May 19, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented May 19, 2022

Can one of the admins verify this patch?

@asf-ci
Copy link

asf-ci commented May 19, 2022

Can one of the admins verify this patch?

@github-actions github-actions bot added the go label May 19, 2022
@codecov
Copy link

codecov bot commented May 19, 2022

Codecov Report

Merging #17716 (9ecb375) into master (212d63d) will decrease coverage by 0.00%.
The diff coverage is 35.89%.

@@            Coverage Diff             @@
##           master   #17716      +/-   ##
==========================================
- Coverage   73.99%   73.99%   -0.01%     
==========================================
  Files         695      696       +1     
  Lines       91798    91851      +53     
==========================================
+ Hits        67926    67962      +36     
- Misses      22624    22640      +16     
- Partials     1248     1249       +1     
Flag Coverage Δ
go 50.45% <35.89%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/go/pkg/beam/core/runtime/graphx/translate.go 43.01% <ø> (ø)
sdks/go/pkg/beam/core/sdf/wrappedbounded.go 0.00% <0.00%> (ø)
sdks/go/pkg/beam/core/runtime/exec/datasource.go 64.01% <40.00%> (-1.55%) ⬇️
sdks/go/pkg/beam/runners/dataflow/dataflow.go 53.64% <0.00%> (+0.62%) ⬆️
...ks/go/pkg/beam/runners/dataflow/dataflowlib/job.go 22.84% <0.00%> (+6.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 212d63d...9ecb375. Read the comment docs.

@jrmccluskey
Copy link
Contributor Author

Run Go Flink ValidatesRunner

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just had a few comments, but they're mostly cosmetic or helping me understand how things work (mostly that tbh)

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @lostluck for label go.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@jrmccluskey
Copy link
Contributor Author

The Flink breakage is surprisingly tied to PR #17681 as the TestStream tests are passing an unwindowed, unbounded side input into our passert functions that make them side inputs. Will be looking into fixing that next

Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but otherwise LGTM. I'll do another pass after lunch.

@github-actions github-actions bot added io and removed io labels May 19, 2022
@jrmccluskey
Copy link
Contributor Author

Run Go Flink ValidatesRunner

@jrmccluskey
Copy link
Contributor Author

The self-checkpointing test passes, we're good on that front

size, ok := root.Elm2.(float64)
if !ok {
log.Warnf(context.Background(), "expected size to be type float64, got type %T", root.Elm2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Warnf(context.Background(), "expected size to be type float64, got type %T", root.Elm2)
log.Warnf(context.Background(), "expected restriction size to be type float64, got type %T", root.Elm2)

@@ -385,11 +396,16 @@ func (n *DataSource) Checkpoint() (SplitResult, time.Duration, bool, error) {
if err != nil {
return SplitResult{}, -1 * time.Minute, false, err
}
if len(rs) == 0 {
return SplitResult{}, -1 * time.Minute, false, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to wrap my head around when we would ever expect this case - I have 2 related questions:

  1. If the user has checkpointed but then returns an empty residual, they shouldn't have checkpointed, right? I'd expect us to at least warn in that case probably.
  2. Even if there are no residuals, don't we still want to validate that they haven't set any primaries? That's still an error waiting to happen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A no-residual return is indicative of a no-op split, which can happen. In a checkpointing context we wouldn't necessarily expect it but it adds some protection if a user schedules a bundle to resume that didn't have any work left.

@github-actions github-actions bot added the io label May 19, 2022
Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more nits and cleanups.

sdks/go/pkg/beam/core/runtime/graphx/translate.go Outdated Show resolved Hide resolved
sdks/go/pkg/beam/core/runtime/exec/datasource.go Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the io label May 19, 2022
Copy link
Contributor

@lostluck lostluck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Jack!

@lostluck lostluck merged commit f1980dc into apache:master May 19, 2022
@jrmccluskey jrmccluskey deleted the errDoc branch May 24, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants