-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Enable withFormatRecordOnFailureFunction() equivalent for BigQuery STORAGE_WRITE_API #31354
Comments
.take-issue |
There are some difficulties for making such
@sarinasij Could you please share an example how do you use |
Below is our current usage description: To facilitate the processing of failed records, we require the original AvroGenericRecordMessage for further deadletter handling. It's important to note that the AvroGenericRecordMessage contains more information than the TableRow in BigQuery. Specifically, it includes the eventType, which is not part of the table columns but is crucial for dead letter metrics. To address this, we utilize the withFormatRecordOnFailureFunction() to construct a dummy TableRow that can be decoded back into the original AvroGenericRecordMessage.
With the case I would think "add withFormatRecordOnFailureFunction() from TableRow to TableRow" might not work since we need additional info for the failed rows (which is from the original AvroGenericRecordMessage). |
What would you like to happen?
We have a dataflow pipeline that reads data from PubSub and writes to BigQuery.
Current status:
BigQuery write method: STREAMING_INSERTS
Function used to get BigQuery deadletter: getFailedInsertsWithErr()
Deadletter format function: withFormatRecordOnFailureFunction()
We have a pipeline that writes multiple events dynamically to different BQ destination tables. withFormatRecordOnFailureFunction() is currently used to transform the bad inserts to the desired format for further deadletter processing - rather than return the original TableRow itself, we provide a customized function encoding the returned TableRow object by adding eventType field thus we can figure out which table it writes to.
As we are enhancing the pipeline by using STORAGE_WRITE_API, we are facing the below issue.
BigQuery write method: STORAGE_WRITE_API
Function used to get BigQuery deadletter: getFailedStorageApiInserts() (as getFailedInsertsWithErr() cannot be used for STORAGE_WRITE_API)
Deadletter format function: N/A
Without a withFormatRecordOnFailureFunction() equivalent, we cannot format the failed inserts TableRows which is a blocker for our upgrade.
How this might work:
Add withFormatRecordOnFailureFunction() equivalent for BigQuerySTORAGE_WRITE_API
As we are dynamically writing to BigQuery tables, the Pcollection may contains multiple eventTypes, we will lose the eventType info if the failure transformation function cannot be added to those failure inserts.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: