[SPARK-48280][SQL] Improve collation testing surface area using expression walking #46801

mihailom-db · 2024-05-30T07:25:46Z

What changes were proposed in this pull request?

This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include:

Expression Walker for expression evaluation
Expression Walker for SQL query examples
Expression Walker for codeGen generation

Why are the changes needed?

Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support.
To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing:

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This PR is only related to testing.

Was this patch authored or co-authored using generative AI tooling?

No.

# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

dbatomic · 2024-06-04T13:20:08Z

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

+        collationType: CollationType): Expression =
+      inputType match {
+        // Try to make this a bit more random.
+        case AnyTimestampType => Literal("2009-07-30 12:58:59")


Let's try to organize this a bit better. I think that in future this logic may become more complex (e.g. we don't want to just pass 1 and "dummy_string". Instead we will try with different string shapes + special rules for integers (-1, 0, 1, strlen, strlen + 1...).

Again, my recommendation is to add new class for expression walker and define this logic as methods of that class.

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala

cloud-fan · 2024-06-26T02:24:18Z

sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala

+    val (funInfos, toSkip) = extractRelevantExpressions()
+
+    for (f <- funInfos.filter(f => !toSkip.contains(f.getName))) {
+      println("checking - " + f.getName)


shall we use log?

or maybe do nothing, to avoid expanding the test log too much.

cloud-fan · 2024-06-26T02:26:17Z

sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala

+        try {
+          instanceUtf8Binary match {
+            case replaceable: RuntimeReplaceable =>
+              replaceable.replacement.eval(EmptyRow)


shall we get the result and exception at the same time? Using scala Either

cloud-fan · 2024-06-26T02:28:06Z

sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala

+    }
+  }
+
+  test("SPARK-48280: Expression Walker for codeGen generation") {


We usually do not test eval and codegen separately. See ExpressionEvalHelper.checkEvaluation

This situation I would say is a bit different. We are not testing each expression with predeterminate solution. What we are doing is comparing UTF8_BINARY and UTF8_LCASE and trying to test whether they will return the same result. In this case it would be hard to use ExpressionEvalHelper.checkEvaluation in total. We could change the test to check whether both UTF8_BINARY and UTF8_LCASE are generating the same result in with codeGen and without, but I am not sure if there are other options than proposed PR for codeGen test

sql/core/src/test/scala/org/apache/spark/sql/CollationExpressionWalkerSuite.scala

…K-48280

### What changes were proposed in this pull request? Fix codeGen path for FindInSet. ### Why are the changes needed? Error in original PR (#46682), caught by: #46801. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47179 from uros-db/fix-findinset. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

mihailom-db · 2024-07-03T12:12:27Z

This is now ready for merge, all tests have passed, as find_in_set was fixed this morning.

cloud-fan · 2024-07-04T08:11:43Z

thanks, merging to master!

### What changes were proposed in this pull request? Fix codeGen path for FindInSet. ### Why are the changes needed? Error in original PR (apache#46682), caught by: apache#46801. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47179 from uros-db/fix-findinset. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ssion walking ### What changes were proposed in this pull request? This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include: 1. Expression Walker for expression evaluation 2. Expression Walker for SQL query examples 3. Expression Walker for codeGen generation ### Why are the changes needed? Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support. To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing: - https://issues.apache.org/jira/browse/SPARK-48472 - https://issues.apache.org/jira/browse/SPARK-48572 - https://issues.apache.org/jira/browse/SPARK-48574 - https://issues.apache.org/jira/browse/SPARK-48600 - https://issues.apache.org/jira/browse/SPARK-48662 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR is only related to testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46801 from mihailom-db/SPARK-48280. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? Fix codeGen path for FindInSet. ### Why are the changes needed? Error in original PR (apache#46682), caught by: apache#46801. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47179 from uros-db/fix-findinset. Authored-by: Uros Bojanic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…ssion walking ### What changes were proposed in this pull request? This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include: 1. Expression Walker for expression evaluation 2. Expression Walker for SQL query examples 3. Expression Walker for codeGen generation ### Why are the changes needed? Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support. To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing: - https://issues.apache.org/jira/browse/SPARK-48472 - https://issues.apache.org/jira/browse/SPARK-48572 - https://issues.apache.org/jira/browse/SPARK-48574 - https://issues.apache.org/jira/browse/SPARK-48600 - https://issues.apache.org/jira/browse/SPARK-48662 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR is only related to testing. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46801 from mihailom-db/SPARK-48280. Authored-by: Mihailo Milosevic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

mihailom-db added 3 commits May 15, 2024 08:53

Add test

3be0b6a

Enable more functions

d630120

Improve test for expression walking

357334e

github-actions bot added the SQL label May 30, 2024

mihailom-db added 9 commits May 30, 2024 09:29

Merge remote-tracking branch 'upstream/master' into SPARK-48280

8a95bed

# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

Add more functions

56532d4

Fix null problem

af1268e

Merge remote-tracking branch 'upstream/master' into SPARK-48280

716e778

# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala

Fix conflicts

f5012ec

Remove unused inports

73be32b

Remove prints

394f85e

Fix trailing comma error

698fbcf

Add polishing

2c47eaf

mihailom-db changed the title ~~[WIP][SPARK-48280][SQL] Add Expression Walker for Testing~~ [SPARK-48280][SQL] Add Expression Walker for Testing Jun 4, 2024

dbatomic reviewed Jun 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala Outdated Show resolved Hide resolved

dbatomic reviewed Jun 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala Outdated Show resolved Hide resolved

stefankandic suggested changes Jun 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala Outdated Show resolved Hide resolved

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala Outdated Show resolved Hide resolved

stefankandic reviewed Jun 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala Outdated Show resolved Hide resolved

dbatomic reviewed Jun 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala Outdated Show resolved Hide resolved

dbatomic reviewed Jun 4, 2024

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/CollationSuite.scala Outdated Show resolved Hide resolved

mihailom-db added 6 commits June 5, 2024 09:29

Add new Suite

ba680db

Revert changes in CollationSuite

2f3fc4c

Refactor code

e4ea17d

Add MapType support

263c141

Add support for StructType

29bb400

Remove unnecessary prints

55f84da

dbatomic reviewed Jun 6, 2024

View reviewed changes