Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys #5505

Merged
merged 5 commits into from
May 23, 2022

Conversation

andygrove
Copy link
Contributor

Closes #5325

Adds support for spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys.

@andygrove andygrove self-assigned this May 16, 2022
@andygrove
Copy link
Contributor Author

build

@andygrove andygrove changed the title WIP: Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys May 16, 2022
@andygrove andygrove marked this pull request as ready for review May 16, 2022 23:38
nartal1
nartal1 previously approved these changes May 17, 2022
@andygrove andygrove marked this pull request as draft May 19, 2022 21:16
@andygrove andygrove changed the title Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys WIP: Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys May 19, 2022
@andygrove
Copy link
Contributor Author

Moving back to draft. I need to make sure hasSideEffects is implemented correctly as per #5546

@revans2
Copy link
Collaborator

revans2 commented May 19, 2022

Moving back to draft. I need to make sure hasSideEffects is implemented correctly as per #5546

Actually that is an existing bug in this. So you can fix it if you want here, but it does not have to be. This is a little more complex than #5546 because we can add an arbitrary new key. Before we could assume that there were no null keys, but here we cannot. So if you want to be exact the it should look something like.

  override lazy val hasSideEffects: Boolean = true

Because if a null is ever returned, then we have to throw an exception. A DataType in Spark itself has no indications of nullable vs not nullable. Only when it wrapped in another nested type do those start to show up. So there is no way for us to not know if the side effect is not possible with the information we have.

@sameerz sameerz added the feature request New feature or request label May 20, 2022
@revans2
Copy link
Collaborator

revans2 commented May 20, 2022

Actually I am wrong. The expression itself is nullable. I feel dumb. It should be something like

 override lazy val hasSideEffects: Boolean =
    function.nullable || GpuCreateMap.exceptionOnDupKeys || super.hasSideEffects

@andygrove andygrove marked this pull request as ready for review May 20, 2022 18:10
@andygrove andygrove changed the title WIP: Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys May 20, 2022
@andygrove
Copy link
Contributor Author

build

@sameerz sameerz added this to the May 2 - May 20 milestone May 20, 2022
@andygrove andygrove merged commit 8a06c09 into NVIDIA:branch-22.06 May 23, 2022
@andygrove andygrove deleted the transform-keys-last-win branch May 23, 2022 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support spark.sql.mapKeyDedupPolicy=LAST_WIN for TransformKeys
4 participants