-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Speed up boolean masking on Series / index #30349
Comments
Another 30% of the time is spent in applying the boolean mask on the RangeIndex (for the case above). This also seems a non-optimized case: it's not directly handled in the |
@TomAugspurger I now get 157us for the |
I don't think that's accurate anymore. |
And a quick timing confirmed that getitem with a rangeindex is not slower than with a int64 index for the above case, so it's indeed certainly not due to slowness of RangeIndex specifically. Now, the reason that I actually saw "inference of index dtype" in the profile as mentioned above: although |
(I didn't check again how significant this part is of the total, though) |
it sounds like this issue is closeable? (there are likely still perf gains to be had, but they aren't really specific to boolean masking) |
Yep. I opened #31903 for the observation about dtype inference in Index.getitem I mentioned above. |
On master, this takes a boolean mask on a 10,000 element series take me ~300us
Much of that time is spent in a
try / except
insideIndex.get_value
That can never succeed for a boolean mask. By skipping that path entirely, we improve perf on this example by ~2x
We might be able to restructure
Index.get_value
orSeries.__getitem__
a bit to not go down this path when we have a boolean ndarray as a mask.The text was updated successfully, but these errors were encountered: