-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add explicit schema support to JdbcIO read and xlang transform. #34128
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #34128 +/- ##
============================================
+ Coverage 59.25% 59.30% +0.04%
Complexity 3272 3272
============================================
Files 1164 1166 +2
Lines 178325 178633 +308
Branches 3413 3393 -20
============================================
+ Hits 105675 105936 +261
- Misses 69250 69302 +52
+ Partials 3400 3395 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
9d8ccdc
to
5afc467
Compare
This reverts commit daeacfc.
addresses #23029 |
Assigning reviewers. If you would like to opt out of this review, comment R: @damccorm for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Just to confirm, have you been able to verify it on Dataflow runner? e.g. submit a jdbc read pipeline locally where it does not have access to database; and Dataflow workers have access and the pipeline runs successfully
@@ -84,7 +84,7 @@ public Schema configurationSchema() { | |||
*/ | |||
@Override | |||
public JdbcSchemaIO from(String location, Row configuration, @Nullable Schema dataSchema) { | |||
return new JdbcSchemaIO(location, configuration); | |||
return new JdbcSchemaIO(location, configuration, dataSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, previously the dataSchema parameter wasn't passed down stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah and in the JdbcIO#readRows and readWithPartitions there schema case wasnt handled.
I also need to add the ability to explicitly pass the partition lower and upper bounds through xlang, otherwise that part will also happen at pipeline construction time. Will do in a followup later this week.
I confirmed that infer schema doesn't happen during pipeline construction when the schema parameter is passed :D . I didn't try to run on DataflowRunner because I am not sure how I am supposed to use the modified Java SDK expansion service. For example, I usually pass something like sdk_harness_container_image_overrides=.java.,gcr.io/cloud-dataflow/v1beta3/beam_java11_sdk:2.62.0, does that mean I need to build the Java SDK container etc? If you look at the custom query/write statement tests, those schema inferences would file if we don't pass a custom schema. So it indirectly verifies the solution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
…m. (apache#34128)" This reverts commit a994291.
Postcommit xlang test run https://github.com/apache/beam/actions/runs/13635975481/job/38114752052?pr=34128
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.