[BUG] Incorrect broadcasted equality for scalar comparison to int64 columns #12072
Labels
2 - In Progress
Currently a work in progress
bug
Something isn't working
Python
Affects Python cuDF API.
Describe the bug
Equality of a
NumericalColumn
with large int64 values against another large int64 value can wrongly advertise false positives.Steps/Code to reproduce bug
Expected behavior
Equality for int types should be correct.
Why does this happen?
Before heading into
libcudf
we need to turn the scalar host value into a device scalar, for which a dtype must be deduced. This is done inNumericalColumn.normalize_binop_value
.The relevant code is:
So first we inspect the value and ask numpy to come up with a dtype that can represent this value, then we promote the two types of the column and the deduced dtype to a common type, cast our target value to that and continue.
What goes wrong?
Numpy prefers unsigned over signed types
min_scalar_type
which is problematic, because if the column type is signed, then we think we can no longer represent the RHS value in the column type. So we upcast to float (lossy) and call into libcudf withbinaryop(column, device_scalar_as_double(val))
and now we get standard C++ type promotion on the pointwise equality comparison.Environment overview (please complete the following information)
cudf.__version__ => '22.12.00a+193.g991c86b13a'
The text was updated successfully, but these errors were encountered: