-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: BigQuery BatchLoad incompatible table schema error #25355
Comments
Has same root cause of #22372 and confirmed that the issue did not occur in Beam 2.39.0. While most of the use cases are fixed, this bug remains as of 2.45.0. |
I think I have reproduced the error: https://ci-beam.apache.org/job/beam_PostCommit_Java_DataflowV2_PR/151/ run on branch: 2504882 Example jobId: |
UpdateSchemaDestination created by #17365 has no comment nor doc string. This task should also add necessary comments to that class. |
.take-issue |
As @ahmedabu98 pointed out the original working example has typo. Initiated another job 2023-02-08_11_47_55-389031392081500435 branch: f446e5c The problem is that the condition Line 263 in 69ddf44
is never true. The schema returned by DynamicsDestination object is:
schema by
though they are effectively equivalent, and the temp table generated has the same schema on BigQueryUI, their gson representation is not the same. |
I think I find the cause of the original issue (that in the issue description): Line 125 in 9fcb3a5
the processElement here does not consider the case of dynamic destination. It simply gets the first destination in the incoming list of element to setup zeroJob, and the outputs have have same destination. |
The implementation of Java UpdateSchemaDestination and Python UpdateDestinationSchema is not the same. Python does not have this issue. In Python implementation both zero load job and copy job takes same PCollection as main input. We should either change the java implementation to be same as Python, or make the input of UpdateSchemaDestination KV<DestinationT, Iterable<WriteTables.Result>> so each processElement deals with one destination. |
ah I see, thanks for clarification. so pass either wrapped or unwrapped dynamic destination to UpdateSchemaDestination is fine. |
We still would still want to wrap with match table dynamicdestinations because that's what we're doing when creating temp tables. For a given temp table, we want to pull the same schema consistently for both operations. |
What happened?
This bug is triggered when all of these condition met:
Then it may cause the temp table and final table having incompatible schema, regardless the schema is explicitly set or not.
error message:
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: