-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing issue with explode_outer position not nulling position entries of null rows #7754
Fixing issue with explode_outer position not nulling position entries of null rows #7754
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.19 #7754 +/- ##
===============================================
+ Coverage 81.86% 82.30% +0.43%
===============================================
Files 101 101
Lines 16884 17053 +169
===============================================
+ Hits 13822 14035 +213
+ Misses 3062 3018 -44
Continue to review full report at Codecov.
|
This change looks good in terms of spark-rapids. |
@kkraus14 is there a Pandas equivalent for "explode_outer" that we should be concerned with matching its semantics? |
Yes, it's just called
Ignore the |
So |
Ah, but this is different behavior. The change @hyperbolic2346 just made would result in:
i.e., the |
We don't have a need for an equivalent of a non-outer explode at this time. |
What's more, @hyperbolic2346's original implementation would have resulted in:
i.e., zeros in the empty/null list positions. So that didn't do what Pandas wanted either. |
In fact, @kkraus14 the example you gave is totally different. For input data like:
@hyperbolic2346 's original implementation returns
The new one would do:
|
Ah, perhaps Pandas doesn't need the |
Yes, we do not need position handling, just |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Co-authored-by: Jake Hemstad <[email protected]>
@gpucibot merge |
explode_outer
supports writing a position column, but if the row was null it would incorrectly set the position to 0 and the row valid. Instead, it should null that position row as well. Luckily the null column matches 100% with the null column of the exploded column, so we can just copy it after it is created.Fixes #7721