-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proper support for null
literal by introducing ScalarValue::Null
#2364
Conversation
For someone reviews this pr,
I convert this pr to draft for code review. It's ok to go if new |
cc @alamb @andygrove @yjshen @xudong963 Please have a review, thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cargo.toml
Outdated
@@ -38,3 +38,6 @@ exclude = ["datafusion-cli"] | |||
[profile.release] | |||
codegen-units = 1 | |||
lto = true | |||
|
|||
[patch.crates-io] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW arrow-rs 13.0.0 with these changes should be released sometime early next wee
@@ -39,6 +39,8 @@ use std::{convert::TryFrom, fmt, iter::repeat, sync::Arc}; | |||
/// This is the single-valued counter-part of arrow’s `Array`. | |||
#[derive(Clone)] | |||
pub enum ScalarValue { | |||
/// represents null | |||
Null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
datafusion/common/src/scalar.rs
Outdated
@@ -1522,6 +1550,7 @@ impl ScalarValue { | |||
eq_array_primitive!(array, index, IntervalMonthDayNanoArray, val) | |||
} | |||
ScalarValue::Struct(_, _) => unimplemented!(), | |||
ScalarValue::Null => false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
@@ -1445,7 +1445,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
SQLExpr::Value(Value::Number(n, _)) => parse_sql_number(&n), | |||
SQLExpr::Value(Value::SingleQuotedString(s)) => Ok(lit(s)), | |||
SQLExpr::Value(Value::Null) => { | |||
Ok(Expr::Literal(ScalarValue::Utf8(None))) | |||
Ok(Expr::Literal(ScalarValue::Null)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this explains a lot of odd type coercion errors I have been seeing
} | ||
|
||
/// coercion rules from NULL type. Since NULL can be casted to most of types in arrow, | ||
/// either lhs or rhs is NULL, if NULL can be casted to type of the other side, the coecion is valid. | ||
fn null_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool
ScalarValue::Null
to dfnull
literal by introducing ScalarValue::Null
cc @alamb PTAL, thank you ❤️ |
501add5
to
0100277
Compare
Since #2382 is merged, I switch this pr to |
cc @alamb @andygrove @yjshen @xudong963 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking very good @WinkerDu -- thank you!
I think there is a small issue with null = null
but it should be fairly easy to solve.
Thanks again!
"| 999 |", | ||
"+----------------------------------------------------------------------------------------------+", | ||
"+----------------------------------------------------------------------------------------+", | ||
"| CASE WHEN #t1.c1 = Utf8(\"a\") THEN Int64(1) WHEN NULL THEN Int64(2) ELSE Int64(999) END |", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key difference here. Very nice 👍
@@ -829,7 +829,13 @@ async fn inner_join_nulls() { | |||
let sql = "SELECT * FROM (SELECT null AS id1) t1 | |||
INNER JOIN (SELECT null AS id2) t2 ON id1 = id2"; | |||
|
|||
let expected = vec!["++", "++"]; | |||
let expected = vec![ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answer is not correct -- there should be no rows that match.
This is because the join should produce rows where id1 = id2
evaluates to true
However, null = null
evaluates to null
🤯
Here is the query in postgres:
alamb=# SELECT * FROM (SELECT null AS id1) t1
INNER JOIN (SELECT null AS id2) t2 ON id1 = id2
alamb-# ;
id1 | id2
-----+-----
(0 rows)
pub fn eq_null(left: &NullArray, _right: &NullArray) -> Result<BooleanArray> { | ||
let length = left.len(); | ||
make_boolean_array(length, false) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct -- specifically I think the resulting boolean array should be full of nulls not false
Perhaps something like:
std::iter::repeat(left.len(), None).collect()
0100277
to
13a1601
Compare
cc @alamb PTAL, thank you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome -- thank you so much @WinkerDu -- this is great
@@ -829,7 +829,11 @@ async fn inner_join_nulls() { | |||
let sql = "SELECT * FROM (SELECT null AS id1) t1 | |||
INNER JOIN (SELECT null AS id2) t2 ON id1 = id2"; | |||
|
|||
let expected = vec!["++", "++"]; | |||
#[rustfmt::skip] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
🎉 we are going to have real null support 😍 |
Thank you all @alamb @andygrove |
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
…ll` (apache#2364) * introduce null * fix fmt
Which issue does this PR close?
Closes #2363 .
Rationale for this change
To solve Null constants issues listed in #1184 , and since /apache/arrow-rs#1572 Null casted from and to most of types in arrow-rs kernel, it's reasonable that introduce Null type to df for type coercion.
What changes are included in this PR?
Introduce
ScalarValue::Null
type to dfAre there any user-facing changes?
No.