Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix schema inference inside parameterized types #32705

Merged

Conversation

reuvenlax
Copy link
Contributor

@reuvenlax reuvenlax commented Oct 8, 2024

Previously Beam prioritized schemas over coders in inference, but did not inspect nested parameterized types for schemas. This led to some sharp edges for users - e.g. if Foo had a registered schema.

PCollection = readFoo();

Would infer the correct SchemaCoder for Foo. However

PCollection<Iterable> = readAllFoos();

Would not search for a schema, and instead take whatever Coder accepted Foo (possibly SerializableCoder). This led to a lot of confusion for users.

This PR ensures that the schema lookup continues while inspecting type parameters.

Note: this PR touches many files due to a new parameter added to CoderRegistry(), however the vast majority of those changes are trivial.

@github-actions github-actions bot added the java label Oct 8, 2024
Copy link
Contributor

github-actions bot commented Oct 9, 2024

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@reuvenlax reuvenlax force-pushed the schema_inference_parameterized_types branch from 2cff426 to bc6fa83 Compare October 9, 2024 02:26
@reuvenlax
Copy link
Contributor Author

R: @ahmedabu98

Copy link
Contributor

github-actions bot commented Oct 9, 2024

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left one suggestion

Comment on lines 200 to 202
public static CoderRegistry createDefault(@Nullable SchemaRegistry schemaRegistry) {
return new CoderRegistry(schemaRegistry);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a breaking change for users

Can we have the old createDefault() method as well and have it return new CoderRegistry(null)?

Would maintain existing use cases and limit the number of files in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point as this is a public method (even though it's probably not intended for use outside of core Beam). Added the old createDefault() back.

@reuvenlax reuvenlax changed the title fix nested schema inference fix schema inference inside parameterized types Oct 9, 2024
@reuvenlax reuvenlax merged commit c243491 into apache:master Oct 9, 2024
28 checks passed
Abacn added a commit that referenced this pull request Nov 15, 2024
damccorm pushed a commit that referenced this pull request Nov 18, 2024
damccorm pushed a commit that referenced this pull request Nov 18, 2024
damccorm added a commit that referenced this pull request Nov 18, 2024
reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants