-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: PandasArray._quantile when empty #46110
Conversation
pandas/core/array_algos/quantile.py
Outdated
return np.array([na_value] * len(qs), dtype=values.dtype) | ||
# Can't pass dtype=values.dtype here bc we might have na_value=np.nan | ||
# with values.dtype=int64 see test_quantile_empty | ||
return np.array([na_value] * len(qs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this would kill anything else here but na_value * np.empty((len(qs), ))
should be significantly faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would break things when na_value is e.g. -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. np.zeros should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you mean np.full? regardless id prefer not to bikeshed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we could also use is
np.tile(np.array([na_value]), (len(qs, ))
This should also be faster than the list multiplication
%timeit np.array([na_value] * 1_000_000)
53.6 ms ± 1.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit na_value * np.zeros((1_000_000, ))
1.3 ms ± 28.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit np.tile(np.array([na_value]), (1_000_000, ))
2.75 ms ± 84.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you mean np.full? regardless id prefer not to bikeshed here
Ok with me, just wanted to mention it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to use np.full, green
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for period/datetime/timedelta already?
Yes |
Not seeing any tests for quantile in arrays, am I missing it, or might they be somewhere else? |
Tested indirectly in the dataframe/series tests |
gentle ping; a couple of follow-ups in the works |
nice! |
Not user-facing, but still good to have fixed.
Gets rid of our last non-idiomatic usage of _from_factorized, xref #33276