Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read API Source v2 #25392

Merged
merged 34 commits into from
Feb 22, 2023
Merged

Read API Source v2 #25392

merged 34 commits into from
Feb 22, 2023

Conversation

vachan-shetty
Copy link
Contributor

Adding the new Read API Source which supports bundling of Streams within Read Session. Customers can enable/disable this feature using the setEnableBundling BigQuery pipeline option (the option flag currently defaults to false).

The design doc for the changes can be found here. This PR addresses #24260.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI.

@vachan-shetty
Copy link
Contributor Author

R: @johnjcasey, @lukecwik, @kmjung

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@vachan-shetty
Copy link
Contributor Author

R: @ahmedabu98

@vachan-shetty
Copy link
Contributor Author

Retest this please.

1 similar comment
@vachan-shetty
Copy link
Contributor Author

Retest this please.

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. You can ignore my logging suggestions if you like another alternative, but I think we should improve some of those.
Still haven't looked at test class, but submitting this review now to give you some time before next release cut.

Comment on lines +165 to +166
// of BigQueryStorageSourceBase, all splits here will be handled by `splitAtFraction()`. As a
// result, this is a no-op.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI because it was mentioned previously as TODO: Implement dynamic work rebalancing, splitAtFraction() is dynamic work rebalancing

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Hope these changes unblock users with large read operations

@ahmedabu98
Copy link
Contributor

Failing test org.apache.beam.sdk.io.gcp.bigquery.BigQueryIOWriteTest.testTriggeredFileLoadsWithTempTablesToExistingNullSchemaTable[1] in Java_GCP_IO_Direct is a known issue #25207. Can be ignored

@johnjcasey
Copy link
Contributor

retest this please

@damccorm
Copy link
Contributor

Test suites aren't statusing, but they've all completed successfully except for https://ci-beam.apache.org/job/beam_PreCommit_Java_GCP_IO_Direct_Commit/1778/ (Java_GCP_IO_Direct) which is still running and seems stuck at the start

@damccorm
Copy link
Contributor

Run Java_GCP_IO_Direct PreCommit

@damccorm
Copy link
Contributor

@johnjcasey johnjcasey merged commit 4ce8eed into apache:master Feb 22, 2023
Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ruslan-ikhsan pushed a commit to akvelon/beam that referenced this pull request Mar 10, 2023
* ReadAPI Source v2

* Renamed the Source V2. Also added tests for the same.

* v2 using OffsetBasedSource and OffsetBasedReader

* Updating tests to have more sensible mock values.

* Updated BqOption flag.

* Simplifying `fractionConsumed` calculation.

* Better variable names.

* Minor refactoring.

* Added a synchronized block in readNextRecord(). Also added comments to explain how OffsetBasedSource + RangeTracker is used.

* Removed unnecessary synchronized block, added Javadoc and improved unit test coverage.

* Consolidated code paths in `BigQueryIO` for Bundled and regular ReadAPI sources.

* Lint fixes.

* Minor Javadoc fix.

* Fix StreamBundle creation logic and some minor code comment updates.

* Updated logging.

* Lint fixes.

* ReadAPI Source v2

* Renamed the Source V2. Also added tests for the same.

* v2 using OffsetBasedSource and OffsetBasedReader

* Updating tests to have more sensible mock values.

* Updated BqOption flag.

* Simplifying `fractionConsumed` calculation.

* Better variable names.

* Minor refactoring.

* Added a synchronized block in readNextRecord(). Also added comments to explain how OffsetBasedSource + RangeTracker is used.

* Removed unnecessary synchronized block, added Javadoc and improved unit test coverage.

* Consolidated code paths in `BigQueryIO` for Bundled and regular ReadAPI sources.

* Lint fixes.

* Minor Javadoc fix.

* Fix StreamBundle creation logic and some minor code comment updates.

* Updated logging.

* Lint fixes.
Abacn added a commit to Abacn/beam that referenced this pull request Feb 26, 2024
Abacn added a commit that referenced this pull request Feb 27, 2024
* Revert "Read API Source v2 (#25392)"

This reverts commit 4ce8eed.

* Preserve the ablility to use read API backend feature

* Add log about num of stream

* Add to CHANGES.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants