Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48280][SQL] Improve collation testing surface area using expression walking #46801

Closed
wants to merge 39 commits into from

Conversation

mihailom-db
Copy link
Contributor

@mihailom-db mihailom-db commented May 30, 2024

What changes were proposed in this pull request?

This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include:

  1. Expression Walker for expression evaluation
  2. Expression Walker for SQL query examples
  3. Expression Walker for codeGen generation

Why are the changes needed?

Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support.
To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing:

Does this PR introduce any user-facing change?

No.

How was this patch tested?

This PR is only related to testing.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label May 30, 2024
@mihailom-db mihailom-db changed the title [WIP][SPARK-48280][SQL] Add Expression Walker for Testing [SPARK-48280][SQL] Add Expression Walker for Testing Jun 4, 2024
collationType: CollationType): Expression =
inputType match {
// Try to make this a bit more random.
case AnyTimestampType => Literal("2009-07-30 12:58:59")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to organize this a bit better. I think that in future this logic may become more complex (e.g. we don't want to just pass 1 and "dummy_string". Instead we will try with different string shapes + special rules for integers (-1, 0, 1, strlen, strlen + 1...).

Again, my recommendation is to add new class for expression walker and define this logic as methods of that class.

@mihailom-db mihailom-db requested a review from dbatomic June 20, 2024 10:58
val (funInfos, toSkip) = extractRelevantExpressions()

for (f <- funInfos.filter(f => !toSkip.contains(f.getName))) {
println("checking - " + f.getName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use log?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe do nothing, to avoid expanding the test log too much.

try {
instanceUtf8Binary match {
case replaceable: RuntimeReplaceable =>
replaceable.replacement.eval(EmptyRow)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we get the result and exception at the same time? Using scala Either

}
}

test("SPARK-48280: Expression Walker for codeGen generation") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually do not test eval and codegen separately. See ExpressionEvalHelper.checkEvaluation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This situation I would say is a bit different. We are not testing each expression with predeterminate solution. What we are doing is comparing UTF8_BINARY and UTF8_LCASE and trying to test whether they will return the same result. In this case it would be hard to use ExpressionEvalHelper.checkEvaluation in total. We could change the test to check whether both UTF8_BINARY and UTF8_LCASE are generating the same result in with codeGen and without, but I am not sure if there are other options than proposed PR for codeGen test

cloud-fan pushed a commit that referenced this pull request Jul 3, 2024
### What changes were proposed in this pull request?
Fix codeGen path for FindInSet.

### Why are the changes needed?
Error in original PR (#46682), caught by: #46801.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #47179 from uros-db/fix-findinset.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@mihailom-db
Copy link
Contributor Author

This is now ready for merge, all tests have passed, as find_in_set was fixed this morning.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in d5dc223 Jul 4, 2024
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
### What changes were proposed in this pull request?
Fix codeGen path for FindInSet.

### Why are the changes needed?
Error in original PR (apache#46682), caught by: apache#46801.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47179 from uros-db/fix-findinset.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
…ssion walking

### What changes were proposed in this pull request?
This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include:

1. Expression Walker for expression evaluation
2. Expression Walker for SQL query examples
3. Expression Walker for codeGen generation

### Why are the changes needed?
Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support.
To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing:

- https://issues.apache.org/jira/browse/SPARK-48472
- https://issues.apache.org/jira/browse/SPARK-48572
- https://issues.apache.org/jira/browse/SPARK-48574
- https://issues.apache.org/jira/browse/SPARK-48600
- https://issues.apache.org/jira/browse/SPARK-48662

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This PR is only related to testing.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46801 from mihailom-db/SPARK-48280.

Authored-by: Mihailo Milosevic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
Fix codeGen path for FindInSet.

### Why are the changes needed?
Error in original PR (apache#46682), caught by: apache#46801.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#47179 from uros-db/fix-findinset.

Authored-by: Uros Bojanic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…ssion walking

### What changes were proposed in this pull request?
This PR is introducing Expression Walker in different forms in order to improve collation testing surface area. The tests added include:

1. Expression Walker for expression evaluation
2. Expression Walker for SQL query examples
3. Expression Walker for codeGen generation

### Why are the changes needed?
Collations introduced a lot of changes to many functions and parts of the code and these tests aim to catch existing errors and prevent addition of new functions without proper implementation of collation support.
To emphasise the importance of these tests, some of the relevant tickets that were opened as a byproduct of this testing:

- https://issues.apache.org/jira/browse/SPARK-48472
- https://issues.apache.org/jira/browse/SPARK-48572
- https://issues.apache.org/jira/browse/SPARK-48574
- https://issues.apache.org/jira/browse/SPARK-48600
- https://issues.apache.org/jira/browse/SPARK-48662

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
This PR is only related to testing.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46801 from mihailom-db/SPARK-48280.

Authored-by: Mihailo Milosevic <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants