-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: remove use of Python sets for interpolate #34727
PERF: remove use of Python sets for interpolate #34727
Conversation
some environments are failing with |
pandas/core/missing.py
Outdated
else: | ||
# both directions... just use _interp_limit | ||
preserve_nans = set(_interp_limit(invalid, limit, limit)) | ||
nans_to_interpolate = _interp_limit(invalid, limit, limit, first, last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could move the above logic into _interp_limit and fastpath if limit_area.
although the algorithm could probably be extended to 2d, in which case the limit_area logic below would also need to move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although the algorithm could probably be extended to 2d, in which case the limit_area logic below would also need to move.
i don't think a 2d version will be needed as interpolate_2d will also be applied along axis when max_gap is added.
we could move the above logic into _interp_limit and fastpath if limit_area.
probably best as a follow-on to keep the diff here smaller
strides = a.strides + (a.strides[-1],) | ||
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) | ||
|
||
def inner(arr, limit): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will move this into a top level function called ffill_mask_with_limit (in this PR)
and plan to take the max_gap logic from #25141 and have a analogous ffill_mask_with_max_gap (separate PR)
looks fine. |
do we have sufficient asv's for this? |
will get back to this after 1.0.5 we don't have many asvs so could probably add more The current asv is a single float block with a sorted RangeIndex and no limit (which can be fastpathed) so not representative of potentially much slower cases |
I'll close this for now. still don't have access to desktop and running asvs on laptop is painful. |
have been investigating avoiding internals for interpolation. xref #34628
This PR address the issue of using Python sets, which is responsible for the bulk of the time in our current asv.
There are other improvements (will raise other PRs), but would need new benchmarks to show a benefit such as different index types and unsorted indexes. so this PR is targeting our current benchmark first.
prelim results (will post asv results if tests pass)