Attempt to subtract with overflow panic in max_distinct_count
#9006
Labels
bug
Something isn't working
max_distinct_count
#9006
Describe the bug
There's an edge case in the present
max_distinct_count
algorithm, whereby there can be an attempt to subtract a potentially larger number of total nulls from a inexact smaller number of total rows to get the distinct valueshttps://github.com/apache/arrow-datafusion/blob/bee7136a04c60a2c06caa630cf1b72f32f7dc574/datafusion/physical-plan/src/joins/utils.rs#L957-L959
This leads to a panic with
attempt to subtract with overflow
.To Reproduce
Extract the three parquet files from files.zip needed for the repro. These were generated using DuckDB with SF=0.01 for TPC-DS benchamrks. The example below is a minimal repro for an issue observed for query 24 from that benchmark.
The above code panics with:
thread 'main' panicked at datafusion/physical-plan/src/joins/utils.rs:958:40: attempt to subtract with overflow
Note that you can get a repro with the cli by appending
DATAFUSION_EXECUTION_COLLECT_STATISTICS=true DATAFUSION_EXECUTION_TARGET_PARTITIONS=1
tocargo run
Expected behavior
The example shouldn't panic, but instead return an empty result.
Additional context
As for the question how does this situation even occur in the first place, from my brief investigation I'm seeing that:
FilterExec
for the filtering on thestore
table predicate returnsInexact(0)
as the number of rows for it's output statistics, since the predicate refutes all the input rows (in the case ofstore
above there's only a single row withs_market_id
equals to 2).store
andstore_sales
the join cardinality estimate is 0 due to the above filtering, but the column statistic are nonetheless merged as is (meaning an exact null count for thestore_sales
columns is inherited)https://github.com/apache/arrow-datafusion/blob/bee7136a04c60a2c06caa630cf1b72f32f7dc574/datafusion/physical-plan/src/joins/utils.rs#L848-L859
customer
the statistics from step 2 enters into play, and when it reachesmax_distinct_count
it hits thenum_rows
beingInexact(0)
butstats.null_count
being exact and greater than zero edge case.The text was updated successfully, but these errors were encountered: