-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add edge segment size to filter out change points that are observed on the data edge #28780
Conversation
Codecov Report
@@ Coverage Diff @@
## master #28780 +/- ##
==========================================
- Coverage 72.23% 72.21% -0.03%
==========================================
Files 684 685 +1
Lines 101198 101518 +320
==========================================
+ Hits 73102 73307 +205
- Misses 26518 26633 +115
Partials 1578 1578
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 16 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Assigning reviewers. If you would like to opt out of this review, comment R: @damccorm for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
# https://github.com/apache/beam/issues/28757 | ||
# Remove this workaround once we have a good solution to deal | ||
# with the edge change points. | ||
if is_edge_change_point(change_point_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this logic into find_latest_change_point_index since there are other considerations in that function to filter out noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also skip adding extra param to find_latest_change_point_index until we have a usecase to customize it with a non-default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
'awaiting additional data. Should the change point persist after ' | ||
'gathering more data, an alert will be raised.' % | ||
(change_point_index, constants._EDGE_SEGMENT_SIZE)) | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we be considering prior change_points_indices? That is, instead of returning change_points_indices[-1]
, we return the latest change point that is not in the edge segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should return the latest change point itself. Even if we ignore it for example 3 days, it gets filed.
We also ignore change points that are occurred 14 days before. Most often change_points_indices[-2] lies outside of that window or doesn't exist. So we could just follow the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. yes, i think it should get file eventually.
Workaround for #28757.
When a change point is observed on the edge sometimes, we can discard it because it will still be alerted but after the
edge_segment_size
runs.Fixes: #28757
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.