-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Periodic Impulse for BQ SchemaUpdate tests #27998
Conversation
…ng engine; only run most relevant tests on dataflow runner
Run PostCommit_Java_DataflowV2 |
Run PostCommit_Java_Dataflow |
Run Java_GCP_IO_Direct PreCommit |
R: @Abacn |
Failing tests are irrelevant: |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Just note that there is a GenerateSequence transform can emit integers with given interval, from pipeline startup (do not flush backlogs at the beginning) |
sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PeriodicSequence.java
Show resolved
Hide resolved
That's true, but GenerateSequence as a streaming source doesn't have a stop. We would need to rely on manually canceling the pipeline (or draining for Dataflow). |
wrong PR comment ,never mind |
Run Java_GCP_IO_Direct PreCommit |
Run PostCommit_Java_DataflowV2 |
Run PostCommit_Java_Dataflow |
Run Java_GCP_IO_Direct PreCommit |
Run PostCommit_Java_Dataflow |
interestingly, org.apache.beam.sdk.io.gcp.bigquery.StorageApiSinkSchemaUpdateIT.testExactlyOnceWithIgnoreUnknownValues[1] is a flaky test |
Looking into it now.. seems like test is flaky when using ignoreUnknownValues. The previous run failed on |
Ahh this is happening because ignoreUnkownValues are not included in the "wait longer" tests: In the flakes, we don't use an input schema and the destination table schema is updated really quickly before any streams are created. From my understanding when the connector fetches the destination schema (because we don't use an input schema), it fetches the updated one and so the extra field is actually expected and not ignored. Trying locally, ignoreUnknownValue tests that don't use input schema don't flake anymore when we wait longer |
Run PostCommit_Java_Dataflow |
Run PostCommit_Java_DataflowV2 |
Run Java_IOs_Direct PreCommit |
* use periodic impulse for schema update tests; manually enable streaming engine; only run most relevant tests on dataflow runner * enable test for dataflow runner * spotless * increase num rows * limit parallelism on directrunner, make tests run faster when possible * use project * spotless * limit stream parallelism * wait longer when not using input schema
Fixes #27911
Follow-up for #27740
Switch to using PeriodicImpulse instead of TestStream, which is only available on Direct and Flink runners.
Had to make an addition to PeriodicImpulse to make it more streaming friendly. The default behavior is that PeriodicImpulse emits all instants from start until Instant.now(). After that, it starts firing at the specified interval. The changes here add an option to make it fire at the specified interval for all elements. This was needed to test schema update because in this test case we care very much about maintaining a consistent interval period between stream appends.
Other changes include manually enabling streaming engine. The storage API sink uses GroupIntoBatches, which requires streaming engine. This streaming mode is automatically enabled in Runner V2 but not V1.
Also followed @Abacn's suggestion of only running the important tests on TestDataflowRunner so that we don't eat too many resources unnecessarily.