-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49683][SQL] Block trim collation #48336
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
HyukjinKwon
approved these changes
Oct 4, 2024
HyukjinKwon
changed the title
[SPARK-49683][SQL]Block trim collation
[SPARK-49683][SQL] Block trim collation
Oct 4, 2024
…b/spark into block-collation-trim
MaxGekk
reviewed
Oct 4, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uros-db @mihailom-db Could you review this PR, please.
mihailom-db
reviewed
Oct 4, 2024
sql/api/src/main/scala/org/apache/spark/sql/internal/types/AbstractStringType.scala
Show resolved
Hide resolved
uros-db
reviewed
Oct 4, 2024
sql/api/src/main/scala/org/apache/spark/sql/internal/types/AbstractStringType.scala
Outdated
Show resolved
Hide resolved
uros-db
approved these changes
Oct 4, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
mihailom-db
approved these changes
Oct 4, 2024
Merged to master. |
MaxGekk
pushed a commit
that referenced
this pull request
Oct 17, 2024
### What changes were proposed in this pull request? Simplifying the AbstractStringType hierarchy. ### Why are the changes needed? The addition of trim-sensitive collation (#48336) highlighted the complexity of extending the existing AbstractStringType structure. Besides adding a new parameter to all types inheriting from AbstractStringType, it caused changing the logic of every subclass as well as changing the name of a derived class StringTypeAnyCollation into StringTypeWithCaseAccentSensitivity which could again be subject to change if we keep adding new specifiers. Looking ahead, the introduction of support for indeterminate collation would further complicate these types. To address this, the proposed changes simplify the design by consolidating common logic into a single base class. This base class will handle core functionality such as trim or indeterminate collation, while a derived class, StringTypeWithCollation (previously awkwardly called StringTypeWithCaseAccentSensitivity), will manage collation specifiers. This approach allows for easier future extensions: fundamental checks can be handled in the base class, while any new specifiers can be added as optional fields in StringTypeWithCollation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? With existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48459 from stefankandic/refactorStringTypes. Authored-by: Stefan Kandic <[email protected]> Signed-off-by: Max Gekk <[email protected]>
himadripal
pushed a commit
to himadripal/spark
that referenced
this pull request
Oct 19, 2024
### What changes were proposed in this pull request? Trim collation is currently in implementation phase. These change blocks all paths from using it and afterwards trim collation gets enabled for different expressions it will be gradually whitelisted. ### Why are the changes needed? Trim collation is currently in implementation phase. These change blocks all paths from using it and afterwards trim collation gets enabled for different expressions it will be gradually whitelisted. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? No additional tests, just added field that's not used. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48336 from jovanpavl-db/block-collation-trim. Lead-authored-by: Jovan Pavlovic <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
himadripal
pushed a commit
to himadripal/spark
that referenced
this pull request
Oct 19, 2024
### What changes were proposed in this pull request? Simplifying the AbstractStringType hierarchy. ### Why are the changes needed? The addition of trim-sensitive collation (apache#48336) highlighted the complexity of extending the existing AbstractStringType structure. Besides adding a new parameter to all types inheriting from AbstractStringType, it caused changing the logic of every subclass as well as changing the name of a derived class StringTypeAnyCollation into StringTypeWithCaseAccentSensitivity which could again be subject to change if we keep adding new specifiers. Looking ahead, the introduction of support for indeterminate collation would further complicate these types. To address this, the proposed changes simplify the design by consolidating common logic into a single base class. This base class will handle core functionality such as trim or indeterminate collation, while a derived class, StringTypeWithCollation (previously awkwardly called StringTypeWithCaseAccentSensitivity), will manage collation specifiers. This approach allows for easier future extensions: fundamental checks can be handled in the base class, while any new specifiers can be added as optional fields in StringTypeWithCollation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? With existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48459 from stefankandic/refactorStringTypes. Authored-by: Stefan Kandic <[email protected]> Signed-off-by: Max Gekk <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Trim collation is currently in implementation phase. These change blocks all paths from using it and afterwards trim collation gets enabled for different expressions it will be gradually whitelisted.
Why are the changes needed?
Trim collation is currently in implementation phase. These change blocks all paths from using it and afterwards trim collation gets enabled for different expressions it will be gradually whitelisted.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
No additional tests, just added field that's not used.
Was this patch authored or co-authored using generative AI tooling?
No.