-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GeneratedClassRowTypeConstraint #22679
Add GeneratedClassRowTypeConstraint #22679
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22679 +/- ##
==========================================
- Coverage 74.19% 74.19% -0.01%
==========================================
Files 708 709 +1
Lines 93465 93498 +33
==========================================
+ Hits 69347 69367 +20
- Misses 22843 22856 +13
Partials 1275 1275
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Assigning reviewers. If you would like to opt out of this review, comment R: @tvalentyn for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
R: @yeandy Since valentyn is out |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Probably need to run formatter. Also seeing some Dataframe schema test errors, but didn't look too much into that.
schema_options: Optional[Sequence[Tuple[str, Any]]] = None, | ||
field_options: Optional[Dict[str, Sequence[Tuple[str, Any]]]] = None, | ||
schema_registry: SchemaTypeRegistry = None, | ||
) -> RowTypeConstraint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> RowTypeConstraint: | |
) -> GeneratedClassRowTypeConstraint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually prefer to use the base-class here. GeneratedClassRowTypeConstraint
can be an implementation detail.
row_proto = schema_pb2.FieldType( | ||
row_type=schema_pb2.RowType( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does schema_pb2.RowType
have to be wrapped by schema_pb2.FieldType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked again, looks like top level of protos are schema_pb2.FieldType
I believe
self._fields, | ||
self._schema_id, | ||
self._schema_options, | ||
self._field_options)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Do you it would be clear to explicitly add None
for the 5th item (schema_registry
)? I was initially looking for the 5 args for from_fields
, but only saw 4 😄 . I'm perfectly ok with the current implementation though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thanks
class GeneratedClassRowTypeConstraint(RowTypeConstraint): | ||
"""Specialization of RowTypeConstraint which relies on a generated user_type. | ||
|
||
Since the generated user_type cannot be pickled, we supply a custom __reduce__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the generated user_type cannot be pickled
Can you please explain why this is the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't fully understand it, but it's been a consistent issue with the Schema code. Each pickle library (built-in, dill, cloudpickle) fails for a different reason. I filed #22714 to track this, and added a (skipped) test.
Co-authored-by: Andy Ye <[email protected]>
e3e71cc
to
59686fe
Compare
Probably this resulted in x-lang test suite failures here: #22748 |
* Add (failing) pickling tests * Add GeneratedClassRowTypeConstraint, plumb options * Add top-level option conversion functions * Refactor NamedTuple generation, always create GeneratedClassRowTypeConstraint * Move registry to apache_beam.typehints.schema_registry * yapf,lint * fixup! Move registry to apache_beam.typehints.schema_registry * Apply suggestions from code review Co-authored-by: Andy Ye <[email protected]> * Add None SchemaRegistry * Add skipped test for pickling generated type Co-authored-by: Andy Ye <[email protected]>
A
RowTypeConstraint
that wraps a generatedNamedTuple
type cannot be pickled, because the generated type cannot be pickled. This PR adds a specialization,GeneratedClassRowTypeConstraint
, with a custom__reduce__
that avoids pickling the user type.Extracted from #22575
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.