Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs/UDAFs #48770
[SPARK-50238][PYTHON] Add Variant Support in PySpark UDFs/UDTFs/UDAFs #48770
Changes from 8 commits
efb2f4a
8672883
57a71aa
e703e34
4295ae6
cde78c2
876d5ca
2ecf567
b7ecf24
8637d5b
689ada1
76205a7
17a6f80
2ce7cfc
8b556bd
f184d18
ef770bd
6bc8aff
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an assert? Should this just return false if it is not a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is intended to be called with
at.is_struct
so I want to prevent developers from using this function with non-struct types. I should add a comment.Checking
is_struct
in this function adds cost in production (where I'm assuming Python runs in optimized mode so asserts are disabled).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we check that the fields are
metadata
andvalue
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does
pandas_udf
go through the same path as an arrow udf path?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the most part. I recall that for pandas UDFs to work, I also had to add changes in
arrow_to_pandas
and_create_batch
too because they treat struct types in a special way. Example: https://github.com/apache/spark/pull/48770/files#r1831583273