You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug cudf.read_orc with certain predicate filters fails in cases where the sum of column values in the column being filtered exceeds int64 limits.
Expected behavior
The returned dataframe doesn't lose valid rows (2 rows in this case).
Environment overview (please complete the following information)
Environment location: bare-metal
Method of cuDF install: conda
Environment details
21.10 nightly from today (Aug 27)
Additional context
Some minor debugging indicated that the logic here fails when the col_sum returned from gathering metadata is a negative value because of overflow.
Quick update here: Looks like the issue might be with cudf.to_orc and how orc_statistics are written.
I verified for the example above that the raw orc statistics value for sum is a negative number (indicating an overflow).
From the orc specification: if the sum overflows long at any point during the calculation, no sum is recorded., so this seems to be a case where we are incorrectly including the sum statistic within the orc metadata.
ayushdg
changed the title
[BUG] read_orc predicate filter returns incorrect result when metadata sum overflows
[BUG] to_orc writes incorrect sum statistics when there's an overflow
Aug 31, 2021
Describe the bug
cudf.read_orc
with certain predicate filters fails in cases where the sum of column values in the column being filtered exceedsint64
limits.Steps/Code to reproduce bug
Expected behavior
The returned dataframe doesn't lose valid rows (2 rows in this case).
Environment overview (please complete the following information)
Environment details
21.10 nightly from today (Aug 27)
Additional context
Some minor debugging indicated that the logic here fails when the
col_sum
returned from gatheringmetadata
is a negative value because of overflow.cc:@randerzander
The text was updated successfully, but these errors were encountered: