-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#22737] Re-write Go SDK data plane to support timers. #25982
Conversation
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov Report
@@ Coverage Diff @@
## master #25982 +/- ##
==========================================
- Coverage 71.41% 71.33% -0.09%
==========================================
Files 779 779
Lines 102530 102662 +132
==========================================
+ Hits 73220 73230 +10
- Misses 27849 27960 +111
- Partials 1461 1472 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 2 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Just few understanding questions and minor nit
Run GoPortable PreCommit |
Run Go PostCommit |
Run Go Portable PreCommit |
Run Go PostCommit |
1 similar comment
Run Go PostCommit |
Run GoPortable PreCommit |
Thanks for the review! Jenkins is currently down, so I'm going to wait till at most tomorrow before merging, to get those runs in. |
Run GoPortable PreCommit |
Run Go PostCommit |
Support receiving and directing timers to the *exec.DataSource from the DataManager.
This doesn't fully implement timers for #22737, and only comprises the required data plane changes. This unblocks their development however, though it includes an untested TimerWriter path, and unused callback invocations in the DataSource to support this later.
The FnAPI Data Stream multiplexes all data for all instructions onto a single channel from the Runner harness. Each message may contain Data or Timers bytes for one or more instructions executing on the SDK harness, to be decoded by the execution plane's DataSource. The Data or Timers are directed to a specfic PTransform.
The data and timers may arrive before or after the associated instruction begins execution. This means that the Data and Timers must be buffered in the SDK, until they are consumed, or their associated instruction is marked as ended. Since a single instruction may have more than one consuming PTransform it's important to track the wanted number of "islast" signals, to compare against the number received, and only signal that all byte streams for this instruction are complete when they match.
Specific things to note:
Unit tests were updated to match the current possible cases, and validate the above mentioned semantics. A "read" benchmark was added as well. Additional larger scale testing was also done to ensure continued correctness of the data plane's behavior.
Also Closes #21164, ensuring that the last element is always consumed.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.