[Feature Request]: Add a DIRECT_READ method to bigqueryio go sdk using BigQuery Storage API #33268

win845 · 2024-12-03T13:02:07Z

win845 · 2024-12-03T13:07:22Z

While the implementation may take a while. What is a current strategy to deal with a sequentially emiting source as linked above? Bigqueryio is reading records in loop and emiting them sequentially without also implementing a progress method.
This leads to pipelines on dataflow never being autoscaled .

Can something be done in subsequent step to make the subsequent processing being redistributed to multipe workers.

Example Code:

bigqueryRows := bigqueryio.Query(scope, config.Project, itemOptionsQuery, reflect.TypeOf(ItemOptionRow{}), 
bigqueryio.UseStandardSQL())
mutations := beam.ParDo(scope, func(bigqueryRow ItemRow, emit func(bigtableio.Mutation)) {
    rowKey := bigqueryRow.Id
    mutation := bigtableio.NewMutation(rowKey)
    // ...
    emit(*mutation)

}, bigqueryRows)

Can bigqueryRows here be somehow consumed in chunks which are distributed on multiple workers for further transform?

win845 added awaiting triage new feature labels Dec 3, 2024

github-actions bot added go P2 labels Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Add a DIRECT_READ method to bigqueryio go sdk using BigQuery Storage API #33268

[Feature Request]: Add a DIRECT_READ method to bigqueryio go sdk using BigQuery Storage API #33268

win845 commented Dec 3, 2024 •

edited

Loading

win845 commented Dec 3, 2024

[Feature Request]: Add a DIRECT_READ method to bigqueryio go sdk using BigQuery Storage API #33268

[Feature Request]: Add a DIRECT_READ method to bigqueryio go sdk using BigQuery Storage API #33268

Comments

win845 commented Dec 3, 2024 • edited Loading

What would you like to happen?

Issue Priority

Issue Components

win845 commented Dec 3, 2024

win845 commented Dec 3, 2024 •

edited

Loading