-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InvalidArgumentError("Column 'COUNT(DISTINCT demo.name)[count distinct]' is declared as non-nullable but contains null values")' #4040
Comments
@MachaelLee Thanks for detailed messages. However, the steps above cannot be directly executed in datafusion, it's ceresdb's job to implement the SQL interface. I found one simple way to reproduce this based on https://github.com/apache/arrow-datafusion/blob/525ac4567ad8d86ad085d8439d890b1f9e9e6bb9/datafusion-examples/examples/memtable.rs#L39 Changes are below: 2 files changed, 6 insertions(+), 8 deletions(-)
datafusion-examples/examples/memtable.rs | 12 +++++-------
datafusion/optimizer/src/optimizer.rs | 2 +-
modified datafusion-examples/examples/memtable.rs
@@ -36,14 +36,12 @@ async fn main() -> Result<()> {
// Register the in-memory table containing the data
ctx.register_table("users", Arc::new(mem_table))?;
- let dataframe = ctx.sql("SELECT * FROM users;").await?;
+ let dataframe = ctx
+ .sql("SELECT id,count(distinct bank_account) From users group by id;")
+ .await?;
timeout(Duration::from_secs(10), async move {
- let result = dataframe.collect().await.unwrap();
- let record_batch = result.get(0).unwrap();
-
- assert_eq!(1, record_batch.column(0).len());
- dbg!(record_batch.columns());
+ dataframe.show().await.unwrap();
})
.await
.unwrap();
@@ -57,7 +55,7 @@ fn create_memtable() -> Result<MemTable> {
fn create_record_batch() -> Result<RecordBatch> {
let id_array = UInt8Array::from(vec![1]);
- let account_array = UInt64Array::from(vec![9000]);
+ let account_array = UInt64Array::from(vec![None]);
Ok(RecordBatch::try_new(
get_schema(),
modified datafusion/optimizer/src/optimizer.rs
@@ -173,7 +173,7 @@ impl Optimizer {
rules.push(Arc::new(ReduceOuterJoin::new()));
rules.push(Arc::new(FilterPushDown::new()));
rules.push(Arc::new(LimitPushDown::new()));
- rules.push(Arc::new(SingleDistinctToGroupBy::new()));
+ // rules.push(Arc::new(SingleDistinctToGroupBy::new()));
// The previous optimizations added expressions and projections,
// that might benefit from the following rules Then execute it via
|
I believe this will be fixed with apache/arrow-rs#3473. See #4828 |
@jonmmease Thanks, I will check with latest df to see if this problem remains. |
I don't think the fix will be available in DataFusion until arrow 31 is released and DataFusion is updated to use this version. |
Thanks for remind, I use df via GitHub commit. It seems HEAD(4bea81b) has broken
Any ideas? |
The compile error above is same with apache/arrow-rs#3066, delete cargo.lock fix this. And I checked this issue have been fixed in latest master(7673fcc). @MachaelLee you can close this issue now. |
i guess the bug doesn't look reproducible on current main |
Thanks for checking @findepi and @jiacai2050 |
Describe the bug
Datafusion panic when I query
select app,count(distinct name) from
demogroup by app
.Here is the stacktrace:
To Reproduce
Expected behavior
Return a result, not panic
Additional context
I found this bug when I use ceresdb, apache/horaedb#302;
And I found if partition_num is set to more than 1, the error is as above; If partition_num is set to 1, error is as:#1623.
With digging into code, I found Logical Plan is :
and physical plan is as following, I guess the second AggregateExec's schema field "COUNT(DISTINCT demo.name)[count distinct]" which is nullable cause the error.
The text was updated successfully, but these errors were encountered: