Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Optimize CASE expression for usage where then and else values are literals #11553

Merged
merged 3 commits into from
Jul 22, 2024

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Jul 19, 2024

Which issue does this PR close?

N/A

Rationale for this change

case_when: scalar or scalar
                        time:   [5.6794 µs 5.7119 µs 5.7566 µs]
                        change: [-70.724% -70.393% -70.042%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Add a fast path for a specific usage of CASE expression

Are these changes tested?

  • Existing tests
  • Added slt tests

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jul 19, 2024
@andygrove andygrove added the performance Make DataFusion faster label Jul 19, 2024
@andygrove andygrove changed the title Optimize CASE expression for usage where then and else values are literals feat: Optimize CASE expression for usage where then and else values are literals Jul 19, 2024
@andygrove andygrove requested review from alamb and comphead July 19, 2024 17:43
.unwrap_or_else(|_| Arc::clone(e));
let else_ = Scalar::new(expr.evaluate(batch)?.into_array(1)?);

Ok(ColumnarValue::Array(zip(&when_value, &then_value, &else_)?))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the input is ColumnarValue::Scalar shouldn't the output also be a ColumnarValue::Scalar (rather than a ColumnarValue::Array?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the output will be an array containing values based on two scalar arguments.

SELECT CASE WHEN a > 2 THEN 'even' ELSE 'odd' END FROM foo
----
odd
even
even

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense (in a separate PR) to produce a dictionary array in this case since it will only even contain two distinct values? 🤔

@@ -50,3 +50,19 @@ SELECT CASE WHEN a > 2 THEN b END FROM foo
NULL
4
6

# scalar or scalar (string)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test where both arguments are scalars (like CASE WHEN 1 > 2 THEN 'true' ELSE 'false') ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


# scalar or scalar (string)
query T
SELECT CASE WHEN a > 2 THEN 'even' ELSE 'odd' END FROM foo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like NULL handling in this specialized implementation is not tested, we can add a (NULL, NULL) row into foo table

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have added this.

@andygrove andygrove merged commit b6e55d7 into apache:main Jul 22, 2024
24 checks passed
@andygrove andygrove deleted the case-scalar-sclar branch July 22, 2024 15:51
Lordworms pushed a commit to Lordworms/arrow-datafusion that referenced this pull request Jul 23, 2024
…re literals (apache#11553)

* Optimize CASE expression for usage where then and else values are literals

* add slt test

* add more test cases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Make DataFusion faster physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants