Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make defaults for optional SchemaTransformProvider methods #30560

Merged
merged 6 commits into from
Mar 11, 2024

Conversation

ahmedabu98
Copy link
Contributor

@ahmedabu98 ahmedabu98 commented Mar 7, 2024

Making some changes to reduce the necessary boilerplate when creating schema-aware transforms.

Some methods (inputCollectionNames, outputCollectionNames) are not required to implement to use as a cross-language transform. These methods just provide information that may be helpful (e.g. to a remote SDK), but don't have any real functional use.

Another method for TypedSchemaTransformProvider (configurationClass) can have a default implementation via reflection.

Of course, implementations can continue implementing these methods as they see fit, but it probably shouldn't be required to do so.

@github-actions github-actions bot added the build label Mar 7, 2024
@damondouglas damondouglas self-requested a review March 7, 2024 17:33
@damondouglas
Copy link
Contributor

@ahmedabu98 I assigned myself as reviewer. I'll review after checks run.

Copy link
Contributor

github-actions bot commented Mar 7, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@ahmedabu98
Copy link
Contributor Author

Failing test is a flake in SpannerChangeStreamErrorTest. @damondouglas it's ready for a review

Copy link
Contributor

@damondouglas damondouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM <3

@ahmedabu98
Copy link
Contributor Author

R: @robertwb

Copy link
Contributor

github-actions bot commented Mar 8, 2024

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@@ -58,10 +59,14 @@ default String description() {
SchemaTransform from(Row configuration);

/** Returns the input collection names of this transform. */
List<String> inputCollectionNames();
default List<String> inputCollectionNames() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intellij thinks these methods are unused - is intellij wrong? or could they be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality, developers make their providers by extending TypedSchemaTransformProvider, which implements SchemaTransformProvider. I guess intellij might grey it out because they're not directly used from SchemaTransformProvider

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if they override these methods but we never call them, does it matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh these methods are eventually called by the expansion service to create the discover response. Reference here:

schemaTransformConfigBuilder.addAllInputPcollectionNames(provider.inputCollectionNames());
schemaTransformConfigBuilder.addAllOutputPcollectionNames(provider.outputCollectionNames());

protected abstract Class<ConfigT> configurationClass();
@SuppressWarnings("unchecked")
protected Class<ConfigT> configurationClass() {
Optional<ParameterizedType> parameterizedType =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you just put it in an Optional and then take it out again, might be simpler to go ahead like

@Nullable ParameterizedType parameterizedType = (ParameterizedType) getClass().getGenericSuperclass();
checkStateNotNull(superClass, "Could not ...");
return (Class<ConfigT>) parameterizedType.getActualTypeArguments[0];

FWIW I am not sure if getActualTypeArguments[0] could still be a type variable in some cases. You might want to check that it is a usefully defined type in some more ways... but also this is probably just always going to work, because of how these things are authored. Very nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, applied the suggestion.

@@ -58,10 +59,14 @@ default String description() {
SchemaTransform from(Row configuration);

/** Returns the input collection names of this transform. */
List<String> inputCollectionNames();
default List<String> inputCollectionNames() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if they override these methods but we never call them, does it matter?

@ahmedabu98 ahmedabu98 merged commit d22a7e7 into apache:master Mar 11, 2024
29 checks passed
@@ -61,7 +61,7 @@
import org.junit.runners.JUnit4;

@RunWith(JUnit4.class)
public class BigQueryStorageWriteApiSchemaTransformProviderTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks DataflowV1 and V2 tests. Any reason moving them from unit test to integration test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: @ahmedabu98 @kennknowles @damondouglas

They use fake BigQuery service so has to be executed locally. Either exclude them from Dataflow test suites or move back to unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh no reason, bad mistake. I'll open a PR to revert this part

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, pls take a look at #30623

hjtran pushed a commit to hjtran/beam that referenced this pull request Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants