Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration tests for streaming Storage Write API (includes schema update feature) #27740

Merged
merged 14 commits into from
Aug 8, 2023

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Jul 28, 2023

This is in continuation of @shunping-google's work in #27213

These are integration tests that write to real BigQuery tables. The test pipeline writes records with a deliberate short timeout between each record so that the Storage API Stream has a chance to recognize the schema update. This PR also adds some warnings when invalid configurations are used (warnings instead of throwing exceptions so as not to break existing workflows. However if we ever do a refactor of this IO, we should turn these warnings into exceptions).

I've opted to not write tests for Batch writes that use the auto schema update feature because that use-case doesn't make much sense. These tests include both STORAGE_WRITE_API and STORAGE_API_AT_LEAST_ONCE, which use StorageApiWritesShardedRecords and StorageApiWriteUnshardedRecord, respectively. Batch writes also use the StorageApiWriteUnshardedRecord transform, which manages stream appends and schema updates. This is to say that even though we don't have explicit Batch mode tests, that code path is broadly covered with these tests.

UPDATE
Had to replace TestStream with PeriodicImpulse in followup: #27998. This is to allow tests to run on TestDataflowRunner

shunping and others added 7 commits June 21, 2023 15:57
Internally, we will decide whether to call withSchema() with a schema
of shuffled fields based on this option.
* Fix a few typos on the method name STORAGE_WRITE_API
* Change the warning message when both numStorageWriteApiStreams and autoSharding are set. In this case, autoSharding takes priority.
* Add an argument check for using both numFileShards and autoSharding via FILE_LOADS.
@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java.
R: @johnjcasey for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@ahmedabu98
Copy link
Contributor Author

Run Java_GCP_IO_Direct PreCommit

Copy link
Contributor

@johnjcasey johnjcasey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ahmedabu98
Copy link
Contributor Author

R: @reuvenlax

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2023

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@Abacn Abacn merged commit 7824f2c into apache:master Aug 8, 2023
@Abacn
Copy link
Contributor

Abacn commented Aug 9, 2023

It appears the added test do not work on dataflow runner v1: https://ci-beam.apache.org/view/PostCommit/job/beam_PostCommit_Java_DataflowV1/lastCompletedBuild/testReport/

should all these tests run on dataflow anyways?

@reuvenlax
Copy link
Contributor

reuvenlax commented Aug 9, 2023 via email

@ahmedabu98 ahmedabu98 changed the title Add integration tests for Storage Write API schema update feature Add integration tests for streaming Storage Write API (includes schema update feature) Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants