-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48280][SQL] Improve collation testing surface area using expression walking #46801
Conversation
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
collationType: CollationType): Expression = | ||
inputType match { | ||
// Try to make this a bit more random. | ||
case AnyTimestampType => Literal("2009-07-30 12:58:59") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to organize this a bit better. I think that in future this logic may become more complex (e.g. we don't want to just pass 1 and "dummy_string". Instead we will try with different string shapes + special rules for integers (-1, 0, 1, strlen, strlen + 1...).
Again, my recommendation is to add new class for expression walker and define this logic as methods of that class.
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala
Outdated
Show resolved
Hide resolved
val (funInfos, toSkip) = extractRelevantExpressions() | ||
|
||
for (f <- funInfos.filter(f => !toSkip.contains(f.getName))) { | ||
println("checking - " + f.getName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we use log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe do nothing, to avoid expanding the test log too much.
try { | ||
instanceUtf8Binary match { | ||
case replaceable: RuntimeReplaceable => | ||
replaceable.replacement.eval(EmptyRow) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we get the result and exception at the same time? Using scala Either
} | ||
} | ||
|
||
test("SPARK-48280: Expression Walker for codeGen generation") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually do not test eval and codegen separately. See ExpressionEvalHelper.checkEvaluation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This situation I would say is a bit different. We are not testing each expression with predeterminate solution. What we are doing is comparing UTF8_BINARY and UTF8_LCASE and trying to test whether they will return the same result. In this case it would be hard to use ExpressionEvalHelper.checkEvaluation
in total. We could change the test to check whether both UTF8_BINARY and UTF8_LCASE are generating the same result in with codeGen and without, but I am not sure if there are other options than proposed PR for codeGen test
sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala
Outdated
Show resolved
Hide resolved
### What changes were proposed in this pull request? Fix codeGen path for FindInSet. ### Why are the changes needed? Error in original PR (#46682), caught by: #46801. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47179 from uros-db/fix-findinset. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
This is now ready for merge, all tests have passed, as find_in_set was fixed this morning. |
thanks, merging to master! |
### What changes were proposed in this pull request? Fix codeGen path for FindInSet. ### Why are the changes needed? Error in original PR (apache#46682), caught by: apache#46801. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47179 from uros-db/fix-findinset. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ssion walking ### What changes were proposed in this pull request? This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include: 1. Expression Walker for expression evaluation 2. Expression Walker for SQL query examples 3. Expression Walker for codeGen generation ### Why are the changes needed? Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support. To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing: - https://issues.apache.org/jira/browse/SPARK-48472 - https://issues.apache.org/jira/browse/SPARK-48572 - https://issues.apache.org/jira/browse/SPARK-48574 - https://issues.apache.org/jira/browse/SPARK-48600 - https://issues.apache.org/jira/browse/SPARK-48662 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR is only related to testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46801 from mihailom-db/SPARK-48280. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Fix codeGen path for FindInSet. ### Why are the changes needed? Error in original PR (apache#46682), caught by: apache#46801. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47179 from uros-db/fix-findinset. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…ssion walking ### What changes were proposed in this pull request? This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include: 1. Expression Walker for expression evaluation 2. Expression Walker for SQL query examples 3. Expression Walker for codeGen generation ### Why are the changes needed? Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support. To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing: - https://issues.apache.org/jira/browse/SPARK-48472 - https://issues.apache.org/jira/browse/SPARK-48572 - https://issues.apache.org/jira/browse/SPARK-48574 - https://issues.apache.org/jira/browse/SPARK-48600 - https://issues.apache.org/jira/browse/SPARK-48662 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR is only related to testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46801 from mihailom-db/SPARK-48280. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include:
Why are the changes needed?
Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support.
To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing:
Does this PR introduce any user-facing change?
No.
How was this patch tested?
This PR is only related to testing.
Was this patch authored or co-authored using generative AI tooling?
No.