-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cloud Bigtable Change Stream integration tests #29127
Conversation
Change-Id: I68a877d5686f1898686b18491c6b4aff5e699862
Change-Id: I490fca7ba2f24b15faa288d6f2c3f209db59f948
… set instanceId Change-Id: I4d5e5347dbba09a0e1ee170b8aa911d2f0a772ef
…set it explicitly prior to this Change-Id: I5ee3c663d3120ee85970ae5f24a962b9535323b3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test. Had a few comment about the overall structure/scope of the change. Not yet reviewed the test (preferably bigtable team could review)
@@ -28,10 +28,4 @@ public interface BigtableTestOptions extends TestPipelineOptions { | |||
String getInstanceId(); | |||
|
|||
void setInstanceId(String value); | |||
|
|||
@Description("Project for Bigtable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting that this file has been moved back and forth: 4833b3f
what is the consideration that moving this back to main?
Also, what is the context that removing "setBigtableProject" option? In theory the project that Bigtable instance locates can be different from the project used to run Dataflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few reasons.
- I'm moving it so I can register it in
GcpIoPipelineOptionsRegistrar.java
for reasons explained in the comment below. - I'm removing
setBigtableProject
because of the reason below, we can't actually setbigtableProject
to pass to the tests anyways. So I don't think anything is depending on this so it should be safely removable. - By registering
bigtableProject
, it conflicts with some of the tests we have because they also try to registerbigtableProject
. So it was either remove here or change it in the tests. - I took a brief look at firestore and spanner, neither has an option to set a different project. So it doesn't seem like it's a common use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this goes to main it will affect all users. For example, the name "InstanceId" sounds generic. In the past we received similar reports concerning this: #27256 (comment)
in fact, other gcp component (e.g. Spanner) also has "instanceId" settings so could introduce confusion.
If it stay within Test scope one can still register it using another Registrar class. GcpIoPipelineOptionsRegistrar is just an auto service class. It loads the options at initialization. Besides it, one can register custom pipeline option classes within Bigtable test classes. This reduces the scope to necessary level.
ack for removal of setBigtableProject
@@ -38,6 +39,7 @@ public Iterable<Class<? extends PipelineOptions>> getPipelineOptions() { | |||
.add(PubsubOptions.class) | |||
.add(FirestoreOptions.class) | |||
.add(TestBigQueryOptions.class) | |||
.add(BigtableTestOptions.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only place that refers to BigtableTestOption class, it's not necessary to move the Option class. We can register the option in the test class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's an alternate solution. I want to be able to pass in "instanceId" as a parameter to the IT. But gcp project defines integrationTest
to explicitly take a few parameters and pass them to beam beamTestPipelineOptions
. So I added instanceId
. But, unless instanceId
is always defined (by registering it for all ITs), other tests fail because they don't have instanceId
defined but we're passing instanceId
.
So this ended up being the solution. Happy to try something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In each IT class, one can get a TestPipelineOptions instance like this
Line 69 in 8a28100
private static PipelineOptions testOptions = TestPipeline.testingPipelineOptions(); |
and then one can use testOptions.as(BigtableTestOptions.class) to get BigtableTestOptions, e.g.
Line 372 in 8a28100
options = TestPipeline.testingPipelineOptions().as(BigQueryIOJsonOptions.class); |
No Registrar involved. Current Bigtable ITs is doing similar things and it should suffice.
unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.
That is because "registering it for all ITs". If we do not register it for all ITs then irrelevant tests won't fail.
@@ -144,4 +146,24 @@ private static Cell createCell(ByteString value, long timestamp, String... label | |||
} | |||
return builder.build(); | |||
} | |||
|
|||
public static BigtableIO.ReadChangeStream buildTestPipelineInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the only method that org.apache.beam.sdk.io.gcp.bigtable.changestreams.it refers to, it is not necessary to change the Util to public class. Just spin up the pipeline in place or create BigtableChangeStreamTestUtils in the changestream package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem, which we (bigtable team) dug ourselves into, is that endTime
is a package private field. We don't want to expose endTime
as an option because it's generally not well supported. But endTime
is useful to terminate tests.
So we have to create the pipeline in the same package as BigtableIO
so we actually can't create the pipeline with endTime
in the changestreams
package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. If there are other options needed then good to promote the scope of this Util class
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
@jackdingilian to review the integration tests. |
…package Change-Id: I228dc61ca0b27131cd38a3ab24f136d0f924d9f7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests LGTM
dataSettingsBuilder.setInstanceId(instanceId); | ||
tableAdminSettingsBuilder.setInstanceId(instanceId); | ||
dataSettingsBuilder.setAppProfileId(appProfileId); | ||
// TODO: Remove this later. But for now, disable direct path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need this? We should get this from the default config now I think, but I don't remember 100% why it was necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still disable direct path in BigtableChangeStreamAccessor
. So it would make sense to keep it disabled when we write/read from bigtable.
Assigning reviewers. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
@Abacn Could you please take a look at the structure again. And if my responses make sense. Happy to make changes. Jack from Cloud Bigtable approved the test content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, commented about the context of why we wanted test options stay in test and possible solutions to keep the TestOption in test scope. Actually FIreStoreOptions is an antipattern (see above) and deemed change
@@ -28,10 +28,4 @@ public interface BigtableTestOptions extends TestPipelineOptions { | |||
String getInstanceId(); | |||
|
|||
void setInstanceId(String value); | |||
|
|||
@Description("Project for Bigtable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this goes to main it will affect all users. For example, the name "InstanceId" sounds generic. In the past we received similar reports concerning this: #27256 (comment)
in fact, other gcp component (e.g. Spanner) also has "instanceId" settings so could introduce confusion.
If it stay within Test scope one can still register it using another Registrar class. GcpIoPipelineOptionsRegistrar is just an auto service class. It loads the options at initialization. Besides it, one can register custom pipeline option classes within Bigtable test classes. This reduces the scope to necessary level.
ack for removal of setBigtableProject
@@ -38,6 +39,7 @@ public Iterable<Class<? extends PipelineOptions>> getPipelineOptions() { | |||
.add(PubsubOptions.class) | |||
.add(FirestoreOptions.class) | |||
.add(TestBigQueryOptions.class) | |||
.add(BigtableTestOptions.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In each IT class, one can get a TestPipelineOptions instance like this
Line 69 in 8a28100
private static PipelineOptions testOptions = TestPipeline.testingPipelineOptions(); |
and then one can use testOptions.as(BigtableTestOptions.class) to get BigtableTestOptions, e.g.
Line 372 in 8a28100
options = TestPipeline.testingPipelineOptions().as(BigQueryIOJsonOptions.class); |
No Registrar involved. Current Bigtable ITs is doing similar things and it should suffice.
unless instanceId is always defined (by registering it for all ITs), other tests fail because they don't have instanceId defined but we're passing instanceId.
That is because "registering it for all ITs". If we do not register it for all ITs then irrelevant tests won't fail.
@@ -144,4 +146,24 @@ private static Cell createCell(ByteString value, long timestamp, String... label | |||
} | |||
return builder.build(); | |||
} | |||
|
|||
public static BigtableIO.ReadChangeStream buildTestPipelineInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack. If there are other options needed then good to promote the scope of this Util class
systemProperty "beamTestPipelineOptions", JsonOutput.toJson([ | ||
"--runner=DirectRunner", | ||
"--project=${gcpProject}", | ||
"--tempRoot=${gcpTempRoot}", | ||
"--firestoreDb=${firestoreDb}", | ||
"--host=${host}", | ||
"--instanceId=${instanceId}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Abacn Want to bring your attention here. I want to be able to run something like this
./gradlew :sdks:java:io:google-cloud-platform:integrationTest --tests=org.apache.beam.sdk.io.gcp.bigtable.changestreams.it.BigtableChangeStreamIT -PgcpProject=my-test-project -PinstanceId=my-test-instance
BigtableTestOptions
has a field instanceId
but in order to pass a specific string that's not the default string, the only that I found to do it is by adding it here.
This works fine for ChangeStreamIT because it registers BigtableTestOptions
which defines instanceId
, so we can pass --instanceId
to beamTestPipelineOptions
. However, all other integration tests that do not register BigtableTestOptions
will fail because we are passing --instanceId
to beamTestPipelineOptions
but they don't have an option that reigsters instanceId
.
That's why I moved the BigtableTestOptions to the main directory and registered it by default.
Is there an alternative way to pass instanceId
to my test that does not require explicitly passing instanceId
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine for ChangeStreamIT because it registers BigtableTestOptions which defines instanceId, so we can pass --instanceId to beamTestPipelineOptions. However, all other integration tests that do not register BigtableTestOptions will fail because we are passing --instanceId to beamTestPipelineOptions but they don't have an option that reigsters instanceId.
Thanks, understand. For irrelevant test this was due to --instanceId
pipeline option not recognized, as you stated. This happens because currently all gcp integration tests are triggered by a single gradle command. Restructure test is a more substantial change. For now, we can keep it in main if this is the only approach to make it work, but consider to make the option specialized, like
- use "--bigtableInstanceId" instanceId is too generic. The options in main scope will affect all users including customers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Done.
…ecifying Change-Id: Ic66c4c061ed2f5979f6a530905e3cbbddd238f18
…es to avoid conflicts Change-Id: I489850e07812058e8c7ebb3c9878eae9d4bc9f06
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Add a basic set of integration tests for Cloud Bigtable Change Stream.
Add functionality
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.