BUG: inf in quantile has undefined behaviour (and possibly different for -inf vs +inf) #21091

jessexknight · 2022-02-19T03:33:13Z

Describe the issue:

When one or more inf or -inf are present in the argument to np.quantile (or np.nanquantile), the results often include nan, when +/-inf could be reasonably returned -- e.g. if there are 10 -infs in 100-long x, np.quantile(x,.05) should probably return -inf, not nan.
np.quantile(-inf,0) =/= np.quantile(-inf,0.) (int vs float, and similarly for 1 vs 1.)
The behaviour for -inf and inf is possibly different in some situations -- e.g. compare actual outputs 7 vs 9 below: the median in 7 that averages 2 and inf returns nan while the median in 9 that averages -inf and 3 returns -inf.

Likely related: #12282

Reproduce the code example:

import numpy as np
inf = np.inf
nan = np.nan
eps = 1e-9
x_pos_even = [1,2,inf,inf]
x_pos_odd  = [1,2,3,inf,inf]
x_neg_even = [-inf,-inf,3,4]
x_neg_odd  = [-inf,-inf,3,4,5]
q_even     = [0,1/3,2/3,1]
q_odd      = [0,.25,.5,.75,1]
printfun = lambda r: print(np.round(r,2))
printfun(np.quantile(x_neg_even,0))
printfun(np.quantile(x_neg_even,0.))
printfun(np.quantile(x_pos_even,q_even))
printfun(np.quantile(x_pos_odd, q_even))
printfun(np.quantile(x_neg_even,q_even))
printfun(np.quantile(x_neg_odd, q_even))
printfun(np.quantile(x_pos_even,q_odd))
printfun(np.quantile(x_pos_odd, q_odd))
printfun(np.quantile(x_neg_even,q_odd))
printfun(np.quantile(x_neg_odd, q_odd))

# expected output -- nan* = truly undefined behaviour, though could possibly be +/- inf
# -inf
# -inf
# [ 1.  , 2.  , inf , inf ]
# [ 1.  , 2.33, nan*, inf ]
# [-inf ,-inf , 3.  , 4.  ]
# [-inf , nan*, 3.67, 5.  ]
# [ 1.  , 1.75, nan*, inf , inf ]
# [ 1.  , 2.  , 3.  , inf , inf ]
# [-inf ,-inf , nan*, 3.25, 4.  ]
# [-inf ,-inf , 3.  , 4.  , 5.  ]

# actual output (spacing adjusted for readability)
# -inf
# nan
# [ 1.  , nan , nan , nan ]
# [ 1.  , 2.33, nan , nan ]
# [ nan , nan , 3.  , 4.  ]
# [ nan , nan , 3.67, 5.  ]
# [ 1.  , 1.75, nan , nan , nan ]
# [ 1.  , 2.  , nan , nan , nan ]
# [ nan , nan ,-inf , 3.25, 4.  ]
# [ nan , nan , 3.  , 4.  , 5.  ]

Error message:

/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:4486: RuntimeWarning: invalid value encountered in subtract
  diff_b_a = subtract(b, a)
/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:4488: RuntimeWarning: invalid value encountered in multiply
  lerp_interpolation = asanyarray(add(a, diff_b_a * t, out=out))
/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:4489: RuntimeWarning: invalid value encountered in subtract
  subtract(b, diff_b_a * (1 - t), out=lerp_interpolation, where=t >= 0.5)
/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py:4488: RuntimeWarning: invalid value encountered in add
  lerp_interpolation = asanyarray(add(a, diff_b_a * t, out=out))

NumPy/Python version information:

1.22.2 3.8.10 (default, Nov 26 2021, 20:14:08)
[GCC 9.3.0]

The text was updated successfully, but these errors were encountered:

alajoieumich · 2022-03-23T17:04:39Z

Looking into this

alajoieumich · 2022-04-10T20:02:27Z

The reason NaN is returned so often is because quantile() runs interpolation calculations on the input array, which results in running custom addition, subtraction, and multiplication functions on infinite values, which they don't know how to handle (this is what all the errors are referring to).
The reason np.quantile(-inf,0) =/= np.quantile(-inf,0.) is because the interpolation calculations are not run if the input quantile is an integer (this check is in numpy/lib/function_base.py line 4651). This means that there is no instance where the code is running infinite value arithmetic if an integer quantile is passed in.
Still looking into how 7 and 9 get different outputs.

I'm trying to code a solution to this issue, but it's looking a lot more difficult than I thought it would be, because I can't (and probably shouldn't) manipulate the add/subtract/multiply functions that cause the issue, which means I have to create a manual workaround. I have an if statement that checks if the input contains inf or -inf, and if not, it runs the original interpolation calculations, which should prevent my workaround from breaking old code or introducing new bugs. However, my workaround is really janky, because I don't know how to generalize inputs of different types (since the input can be array-like, which means the inputs can be arrays, lists, single values, etc.) and because I have to manually loop through the input to detect infinite values.

I'm probably gonna submit a pull request soon so people can look at my code and, if possible, provide some insight, as I'm fairly new to this. As it stands, my workaround somewhat works, but I feel like there has to be a more efficient and clean way to do this. My changes are all in _lerp(), line 4485.

jessexknight added the 00 - Bug label Feb 19, 2022

alajoieumich mentioned this issue Apr 14, 2022

BUG: Added/fixed functionality for infinite and negative infinite values in numpy.quantile(). #21343

Closed

seberg mentioned this issue Jul 10, 2022

BUG: np.percentile gives unreasonable results when array contains np.inf #21932

Open

andrii-riazanov mentioned this issue Sep 23, 2022

Add move_quantile function pydata/bottleneck#418

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: inf in quantile has undefined behaviour (and possibly different for -inf vs +inf) #21091

BUG: inf in quantile has undefined behaviour (and possibly different for -inf vs +inf) #21091

jessexknight commented Feb 19, 2022

alajoieumich commented Mar 23, 2022

alajoieumich commented Apr 10, 2022

BUG: inf in quantile has undefined behaviour (and possibly different for -inf vs +inf) #21091

BUG: inf in quantile has undefined behaviour (and possibly different for -inf vs +inf) #21091

Comments

jessexknight commented Feb 19, 2022

Describe the issue:

Reproduce the code example:

Error message:

NumPy/Python version information:

alajoieumich commented Mar 23, 2022

alajoieumich commented Apr 10, 2022