Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proper support for null literal by introducing ScalarValue::Null #2364

Merged
merged 2 commits into from
May 6, 2022

Conversation

WinkerDu
Copy link
Contributor

@WinkerDu WinkerDu commented Apr 28, 2022

Which issue does this PR close?

Closes #2363 .

Rationale for this change

To solve Null constants issues listed in #1184 , and since /apache/arrow-rs#1572 Null casted from and to most of types in arrow-rs kernel, it's reasonable that introduce Null type to df for type coercion.

What changes are included in this PR?

Introduce ScalarValue::Null type to df

Are there any user-facing changes?

No.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 28, 2022
@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 28, 2022

For someone reviews this pr,
since arrow = { version = "12" } doesn't contain patch /apache/arrow-rs#1572, thanks for @alamb suggestion, I create winkerdu/test_null_cast in my private repo like:

git checkout master -b winkerdu/test_null_cast
git reset --hard dbc47e030c3038d40dbef39fbf3b39ae41f9e98a                 # prepare for version 12.0.0 release
git cherry-pick b50cc737efa2a4f2b1e27c0f3da3bc0403c6a2b6                 # see https://github.com/apache/arrow-rs/pull/1572

I convert this pr to draft for code review. It's ok to go if new arrow-rs version release

@WinkerDu WinkerDu marked this pull request as draft April 28, 2022 03:12
@WinkerDu
Copy link
Contributor Author

WinkerDu commented Apr 28, 2022

cc @alamb @andygrove @yjshen @xudong963 Please have a review, thank you

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very @WinkerDu 👍 I think this is a great step. I left some comments for your consideration. Very very cool

cc @jimexist who was working with null constants a while ago and might be interested i this work

Cargo.toml Outdated
@@ -38,3 +38,6 @@ exclude = ["datafusion-cli"]
[profile.release]
codegen-units = 1
lto = true

[patch.crates-io]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW arrow-rs 13.0.0 with these changes should be released sometime early next wee

@@ -39,6 +39,8 @@ use std::{convert::TryFrom, fmt, iter::repeat, sync::Arc};
/// This is the single-valued counter-part of arrow’s `Array`.
#[derive(Clone)]
pub enum ScalarValue {
/// represents null
Null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
datafusion/common/src/scalar.rs Show resolved Hide resolved
datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
@@ -1522,6 +1550,7 @@ impl ScalarValue {
eq_array_primitive!(array, index, IntervalMonthDayNanoArray, val)
}
ScalarValue::Struct(_, _) => unimplemented!(),
ScalarValue::Null => false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔

@@ -1445,7 +1445,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
SQLExpr::Value(Value::Number(n, _)) => parse_sql_number(&n),
SQLExpr::Value(Value::SingleQuotedString(s)) => Ok(lit(s)),
SQLExpr::Value(Value::Null) => {
Ok(Expr::Literal(ScalarValue::Utf8(None)))
Ok(Expr::Literal(ScalarValue::Null))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this explains a lot of odd type coercion errors I have been seeing

}

/// coercion rules from NULL type. Since NULL can be casted to most of types in arrow,
/// either lhs or rhs is NULL, if NULL can be casted to type of the other side, the coecion is valid.
fn null_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<DataType> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very cool

@alamb alamb changed the title introduce ScalarValue::Null to df Add proper support for null literal by introducing ScalarValue::Null Apr 28, 2022
@WinkerDu
Copy link
Contributor Author

cc @alamb PTAL, thank you ❤️

@alamb
Copy link
Contributor

alamb commented Apr 30, 2022

This looks great @WinkerDu -- thank you -- I am working to get arrow 13 -- see #2382. It should only be a few more days now.

@WinkerDu
Copy link
Contributor Author

WinkerDu commented May 2, 2022

This looks great @WinkerDu -- thank you -- I am working to get arrow 13 -- see #2382. It should only be a few more days now.

Thanks @alamb look forward to it !

@WinkerDu WinkerDu force-pushed the master-introduce-null branch from 501add5 to 0100277 Compare May 4, 2022 04:02
@WinkerDu
Copy link
Contributor Author

WinkerDu commented May 4, 2022

Since #2382 is merged, I switch this pr to open status for code reviewing

@WinkerDu WinkerDu marked this pull request as ready for review May 4, 2022 04:08
@WinkerDu
Copy link
Contributor Author

WinkerDu commented May 4, 2022

cc @alamb @andygrove @yjshen @xudong963
Please have a review, thank you

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking very good @WinkerDu -- thank you!

I think there is a small issue with null = null but it should be fairly easy to solve.

Thanks again!

"| 999 |",
"+----------------------------------------------------------------------------------------------+",
"+----------------------------------------------------------------------------------------+",
"| CASE WHEN #t1.c1 = Utf8(\"a\") THEN Int64(1) WHEN NULL THEN Int64(2) ELSE Int64(999) END |",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key difference here. Very nice 👍

@@ -829,7 +829,13 @@ async fn inner_join_nulls() {
let sql = "SELECT * FROM (SELECT null AS id1) t1
INNER JOIN (SELECT null AS id2) t2 ON id1 = id2";

let expected = vec!["++", "++"];
let expected = vec![
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This answer is not correct -- there should be no rows that match.

This is because the join should produce rows where id1 = id2 evaluates to true

However, null = null evaluates to null 🤯

Here is the query in postgres:

alamb=# SELECT * FROM (SELECT null AS id1) t1
            INNER JOIN (SELECT null AS id2) t2 ON id1 = id2
alamb-# ;
 id1 | id2 
-----+-----
(0 rows)

Comment on lines 1376 to 1378
pub fn eq_null(left: &NullArray, _right: &NullArray) -> Result<BooleanArray> {
let length = left.len();
make_boolean_array(length, false)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct -- specifically I think the resulting boolean array should be full of nulls not false

Perhaps something like:

    std::iter::repeat(left.len(), None).collect()

@andygrove
Copy link
Member

This is looking great. Thanks @WinkerDu! I don't have any additional comments beyond what @alamb has already raised

@WinkerDu WinkerDu force-pushed the master-introduce-null branch from 0100277 to 13a1601 Compare May 5, 2022 18:55
@WinkerDu
Copy link
Contributor Author

WinkerDu commented May 5, 2022

cc @alamb PTAL, thank you

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome -- thank you so much @WinkerDu -- this is great

@@ -829,7 +829,11 @@ async fn inner_join_nulls() {
let sql = "SELECT * FROM (SELECT null AS id1) t1
INNER JOIN (SELECT null AS id2) t2 ON id1 = id2";

let expected = vec!["++", "++"];
#[rustfmt::skip]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit fcc35e8 into apache:master May 6, 2022
@alamb
Copy link
Contributor

alamb commented May 6, 2022

🎉 we are going to have real null support 😍

@WinkerDu
Copy link
Contributor Author

WinkerDu commented May 7, 2022

Thank you all @alamb @andygrove

ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 15, 2022
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 15, 2022
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 15, 2022
ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 15, 2022
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jun 9, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jun 9, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jan 19, 2024
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Feb 5, 2024
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Introduce ScalarValue::Null type to df
3 participants