-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Unexpected accumulator state List([NULL])
in COUNT(distinct ..)
query
#1623
Comments
It seems like there is let col_values = states
.iter()
.map(|state| match state {
ScalarValue::List(Some(values), _) => Ok(values),
_ => Err(DataFusionError::Internal(format!(
"Unexpected accumulator state {:?}",
state
))),
})
.collect::<Result<Vec<_>>>()?; |
These queries work: ❯ select count(*), count(distinct stop_name), trip_tid from stops where stop_name is not null group by trip_tid limit 10;
+-----------------+---------------------------------+----------+
| COUNT(UInt8(1)) | COUNT(DISTINCT stops.stop_name) | trip_tid |
+-----------------+---------------------------------+----------+
| 1 | 1 | 54787914 |
| 1 | 1 | 54804331 |
| 1 | 1 | 54756522 |
| 1 | 1 | 54791196 |
| 1 | 1 | 54775777 |
| 1 | 1 | 54788343 |
| 1 | 1 | 54788169 |
| 2 | 2 | 54793827 |
| 1 | 1 | 54776433 |
| 1 | 1 | 54788382 |
+-----------------+---------------------------------+----------+
❯ select count(distinct stop_name), trip_tid from stops group by trip_tid limit 10;
+---------------------------------+----------+
| COUNT(DISTINCT stops.stop_name) | trip_tid |
+---------------------------------+----------+
| 2 | 54793827 |
| 0 | 54807414 |
| 0 | 54807426 |
| 0 | 54807516 |
| 0 | 54827775 |
| 0 | 54827714 |
| 0 | 54807481 |
| 0 | 54827749 |
| 0 | 54807473 |
| 0 | 54919940 |
+---------------------------------+----------+ |
Perhaps the accumulator just needs to be able to handle "null" (aka seeing no values) |
I am curious about why ❯ select count(distinct stop_name), trip_tid from stops group by trip_tid limit 10; works, ❯ select count(*), count(distinct stop_name), trip_tid from stops group by trip_tid limit 10; does not work. |
What do the plans look like if you do |
|
This is a little amazing for me. Without After debugging, I find that with |
That is a cool optimization (it is in https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/single_distinct_to_groupby.rs) contributed by @ic4y |
I also test the sql: (removing select count(*), count(distinct stop_name)
from stops
limit 3 and there is no error appearing. After debugging, I find that the function |
this issue doesn't seem reproducible on current main. @alamb close? |
I agree -- thanks @findepi I double checked and indeed it is no longer reproduceable andrewlamb@Andrews-MacBook-Pro-2:~/Downloads$ datafusion-cli
DataFusion CLI v39.0.0
> create external table stops stored as parquet location '2021-11.parquet';
0 row(s) fetched.
Elapsed 0.027 seconds.
> select count(*), count(distinct stop_name), trip_tid from stops group by trip_tid limit 10;
+----------+---------------------------------+----------+
| count(*) | count(DISTINCT stops.stop_name) | trip_tid |
+----------+---------------------------------+----------+
| 7 | 0 | 54776439 |
| 7 | 0 | 54776447 |
| 8 | 0 | 54776412 |
| 6 | 0 | 54776432 |
| 25 | 0 | 54775709 |
| 17 | 0 | 54775749 |
| 25 | 0 | 54775832 |
| 12 | 0 | 54775839 |
| 17 | 0 | 54775833 |
| 14 | 0 | 54775738 |
+----------+---------------------------------+----------+
10 row(s) fetched.
Elapsed 0.052 seconds.
|
Describe the bug
A query returns this error:
To Reproduce
Download and extract data.zip
Expected behavior
A result
Additional context
@matthewmturner and @jhorstmann helped me discover this as part of the Rust syncup today 🤣
The text was updated successfully, but these errors were encountered: