-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#27839] Write PipelineOptions to a file instead of an environment variable. #27842
Conversation
Codecov Report
@@ Coverage Diff @@
## master #27842 +/- ##
==========================================
- Coverage 72.32% 72.30% -0.03%
==========================================
Files 678 678
Lines 99726 99740 +14
==========================================
- Hits 72130 72116 -14
- Misses 26032 26063 +31
+ Partials 1564 1561 -3
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
TBH I expected more jobs to fail, since this basically removes Pipeline options from the harness until the SDK side change is fixed. |
Run Java_Examples_Dataflow_Java17 PreCommit |
Run Flink ValidatesRunner |
Run Java Dataflow V2 ValidatesRunner |
1 similar comment
Run Java Dataflow V2 ValidatesRunner |
Run Spark ValidatesRunner Java 11 |
Run PortableJar_Flink PostCommit |
Run PortableJar_Spark PostCommit |
Run Samza ValidatesRunner |
Java_PVR_Flink_Docker |
beam_PostCommit_Java_PVR_Spark3_Streaming |
beam_PostCommit_Java_PVR_Spark_Batch |
Run Samza ValidatesRunner |
Run Java Dataflow V2 ValidatesRunner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks reasonable, but I'm curious why it's needed. Do environment variables have line length limits similar to command lines?
Per the issue (#27839), as far as the linux kernel is concerned, Environment Variables are Command Line Arguments. They consume the same "resource". See this stack overflow about it: As a result, the only way to avoid Argument Too Long errors from the OS when starting a worker is to not serialize the whole pipeline options into the command line too. This has affected Dataflow customers on RunnerV2 which use the portable SDK containers, when the The Legacy Java Dataflow Worker (runnerv1) has done this since 2018, and the fix was never backported to the containers, since we mistakenly believed that Environment Variables would get around the problem. They don't. |
I haven't been able to find a sufficiently indicative Test that would validate that Pipeline Options are being successfully passed around and available or not yet. Basically no tests were failing due to this change even before Byron's PR #27841 was put in. So I'm not yet confident this is working as intended yet. |
Interesting. Thanks for digging into this. |
I've asked @damondouglas to help me validate, since I'm not confident I'll be able to get a functional Java/Maven pipeline authored without spending all day at it. |
Specifically, the smoke test would need to
If we can manage that in a timely fashion I'd be up to cherry pick this to get it into the release. |
cae489f
to
c4c5bda
Compare
I have validated the change with a little guidance from @damondouglas in testing live changes on the Java SDK. I did have a bug in my write error handling causing the container to hard fail 100% of the time. It's concerning that I couldn't find any Gradle tests that quickly smoke check the Java Portable Containers. I'm a little surprised that there wasn't any against the Python Portable runner that rely on containers. (Or if there is, it's not obvious and I couldn't find it among our Actions or Jenkins tasks, or gradle commands). I'll eventually be getting such a suite up for Prism however when I get to seeing other SDKs run on it. Hopefully I can get back to that soon. |
Run Java Dataflow V2 ValidatesRunner |
Co-authored-by: lostluck <[email protected]>
Writes the JSON pipeline options to a local in container file and sets a new env variable PIPELINE_OPTIONS_FILE
Sibling to #27841 and handles the Java variant of #27839.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.