-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: avoid object conversion in fillna(method=pad|backfill) for masked arrays #39953
PERF: avoid object conversion in fillna(method=pad|backfill) for masked arrays #39953
Conversation
simonjayhawkins
commented
Feb 21, 2021
if method is not None: | ||
func = missing.get_fill_func(method) | ||
new_values, new_mask = func( | ||
self._data.copy(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExtensionBlock.interpolate creates another copy L1774. if we always copy in EA.fillna then that copy shouldn't be needed.
pandas/_libs/algos.pyx
Outdated
@@ -623,6 +623,38 @@ def pad_inplace(algos_t[:] values, const uint8_t[:] mask, limit=None): | |||
val = values[i] | |||
|
|||
|
|||
@cython.boundscheck(False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than adding new, can we simply always use the mask (for all dtypes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably, the 1d versions of pad and backfill are only used by EA.fillna and Series.replace.
some of the EAs override fillna and use 2d algorithms. NDArrayBackedExtensionArray uses the 1d versions but should be using a 2d version (not an issue since only called columnwise, but would be fail if called directly on a 2d EA) The other EAs that use this convert to object so are not performant anyway.
not sure about Series.replace, will look some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't support any 2D EA atm, so can leave those out for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. if we are always updating and returning the mask, maybe we should return the same dtype as the original mask, or only allow boolean mask in python space and always return a boolean mask.
looks good. pls add a whatsnew note. you had a comment about multiple copies ; can try to handle here or make a new issue. ping when ready. |
thanks @simonjayhawkins very nice |
This was failing CI (now also on master):
|