Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect CASE WHEN + ELSE NULL behavior #13885

Closed
richox opened this issue Dec 23, 2024 · 2 comments · Fixed by #13886
Closed

Incorrect CASE WHEN + ELSE NULL behavior #13885

richox opened this issue Dec 23, 2024 · 2 comments · Fixed by #13886
Labels
bug Something isn't working

Comments

@richox
Copy link
Contributor

richox commented Dec 23, 2024

Describe the bug

some queries containing case when .. ELSE NULL end produces incorrect results.

To Reproduce

preparing test table:

$ datafusion-cli
DataFusion CLI v43.0.0
> create table t1(s string);
0 row(s) fetched.
Elapsed 0.003 seconds.

> insert t1 (s) values ('aaa'), ('bbb');
+-------+
| count |
+-------+
| 2     |
+-------+
1 row(s) fetched.
Elapsed 0.002 seconds.

> select * from t1;
+-----+
| s   |
+-----+
| aaa |
| bbb |
+-----+
2 row(s) fetched.
Elapsed 0.001 seconds.

query:

> select case when (NULL and (s = 'aaa')) then 'unreachable!' else null end from t1;
+-------------------------------------------------------------------------------+
| CASE WHEN NULL AND t1.s = Utf8("aaa") THEN Utf8("unreachable!") ELSE NULL END |
+-------------------------------------------------------------------------------+
|                                                                               |
|                                                                               |
+-------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.002 seconds.

> select case when (NULL and (s = 'aaa')) then s else 'always here!' end from t1;
+-------------------------------------------------------------------------------+
| CASE WHEN NULL AND t1.s = Utf8("aaa") THEN t1.s ELSE Utf8("always here!") END |
+-------------------------------------------------------------------------------+
| always here!                                                                  |
| always here!                                                                  |
+-------------------------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.001 seconds.

> select case when (NULL and (s = 'aaa')) then s else null end from t1;
+---------------------------------------------------------------+
| CASE WHEN NULL AND t1.s = Utf8("aaa") THEN t1.s ELSE NULL END |
+---------------------------------------------------------------+
| aaa                                                           | <- BUGGY, should be NULL
|                                                               |
+---------------------------------------------------------------+
2 row(s) fetched.
Elapsed 0.001 seconds.

Expected behavior

select case when (NULL and (s = 'aaa')) then s else null end from t1;

the above query should always output nulls, since the only WHEN condition is always null (not true)

Additional context

No response

@richox
Copy link
Contributor Author

richox commented Dec 23, 2024

this happens in case_column_or_null() where nullable when-conditions are not correctly handled.

@Omega359
Copy link
Contributor

possibly related: #13779

richox pushed a commit to kwai/blaze that referenced this issue Dec 24, 2024
fix parsing parquet int96 to timestamp error.

fix compatibility of casting str to int.

fix Incorrect CASE WHEN + ELSE NULL behavior (apache/datafusion#13885).

use number of physical cores as tokio parallelism, removing blaze.tokio.num.worker.threads.

sort multiple batches with interleaver in repartitioner.

use in-place radix sort.

use sync channel in ipc reading.
gy11233 pushed a commit to kwai/blaze that referenced this issue Dec 24, 2024
fix parsing parquet int96 to timestamp error.

fix compatibility of casting str to int.

fix Incorrect CASE WHEN + ELSE NULL behavior (apache/datafusion#13885).

use number of physical cores as tokio parallelism, removing blaze.tokio.num.worker.threads.

sort multiple batches with interleaver in repartitioner.

use in-place radix sort.

use sync channel in ipc reading.

Co-authored-by: zhangli20 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants