-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/DEPR: Deprecate inplace parameter #16529
Comments
I thought The main usage of this feature was to mitigate memory usage in case of large DataFrames? |
I think @jreback might know more about the initial usage, but generally from previous discussion, this parameter is misused and is more prone to introducing bugs. |
inplace does not generally do anything inplace mutating code is much harder to debug (not to mentiin more complicated to support actual inplace ops) so except for inplace indexing generally these operations are simply more verbose and serve just to provide corner cases for testing |
@pandas-dev/pandas-core I think this was agreed during the sprint at Scipy, but I'm not sure if it was discussed when to deprecate the inplace parameters. Is it something we want to do before 1.0? Personally I think the sooner, the better, since the decision is made. |
this is going to cause major downstream complaints, but I agree this should be sooner rather than later. Would accept a FutureWarning for this for 0.24.0 |
A few questions:
|
this is a very limited number, and though possible to detect (e.g. a Series of a single dtype, or a DataFrame of a single dtype), IMHO is not worth leaving this option. just adding code complexity w/o any real perf benefit. So -1 on doing this. |
In my opinion, ideally we should always do the operation in place if possible, but still return a new object. So, Not sure if in practice can be complex to implement in some cases. But if that makes sense, we should deprecate all |
this is too complex. You either do it inplace or you don't. The keyword controls it. If we remove the keyword then it should never modify the original. |
It depends on the application, but not having to copy can be a pretty big win, right? :) Still, I agree that this is tough (impossible) to detect ahead of time. Is it feasible to detect it after the fact, and raise a def fillna(self, **kwargs):
# in BlockManager.fillna
ref_to_data = self._get_ref_to_data()
result = self.apply('fillna', **kwargs)
new_data = self._get_ref_to_data()
if ref_to_data != new_data and inplace:
warnings.warn(PerformanceWarning) If so, I would prefer to go that route, rather than having users change code. |
I am really -1 on this in any users code. So while this may have to be an extended deprecation cycle I think its worth it. |
This is the list of methods functions in the public API with a parameter named EDITED: Moved the lists to the description Those are not really in place, and should be deprecated: [EDIT: moved to the issue description] If that sounds good, I'll create a PR to raise Let me know if there are more from the first list that you want to postpone to the second phase. @jorisvandenbossche may be you also want to take a look here. |
@datapythonista @jreback |
@datapythonista how'd you define that list of methods that are not really inplace? I haven't looked closely, but things like
So that argument is independent of whether or not any operation can be inplace, and we should discuss that. i.e. that it's the "opinion of pandas" that inplace is an anti-pattern to be avoided at all time. Personally, I'm not sure about that. |
my point is this adds a lot of complexity complexity is killing the ability of most folks to make modifications this simplifies the model a great deal |
I just started the list of the actual inplace methods with the ones you said. There are some I can guess they should be able to be in place, like Will move the lists to the description, and try to get it closer to reality, feel free to edit afterwards. |
It's been 1.5 years since you wrote
Just wondering if your stance has changed at all |
A formal proposal to deprecate most occurrences of the |
Here's the direct link to the proposal. |
FWIW, the proposal to remove the |
Is there any way to keep the illusion of a mutating object with a pandas extension? https://pandas.pydata.org/docs/dev/development/extending.html |
@Rinfore can you clarify your question a bit? (pandas objects are still mutable, removing the inplace keyword in methods that didn't actually work inplace won't change that) If you want some syntactic sugar to avoid having to re-assign to the same variable (like |
Thanks so much for the insight @jorisvandenbossche! I've been working on creating some custom accessors in a domain-specific library that allow to add abstractions with embedded business logic to data frames (e.g. fictitious example: based on presence of columns: [EmployeeId, Time, Event], I can classify the data-frame as a EmployeeRecords data frame) and use extension types on it ( In my custom accessors, I may do things such as filter out problematic rows via higher-level functions (e.g. I am also wondering if mutating a data frame to add columns (e.g. In summary:
e.g.
It appears as if the internal assignment in the custom accessor does nothing to modify the reference in the enclosing environment. Thus, I would likely need to return the new reference and have users update their variable references.
I apologise if this is not the correct forum to pose these queries, or if I have made some other error or slight in my post. |
7 years since this was opened, is this still something we want to do? |
This is actually covered by PDEP8, I guess we can close here |
This resolves a few warnings related to accessing things. Removing the `inplace=True` qualifier seems undesirable despite the warning being emitted for it, but it turns out that generally when this is given pandas is creating a copy internally anyway. pandas seem to have a general desire to remove the `inplace=` options everywhere so that code is not mutating, but remains more immutable. See the following comment from years ago. pandas-dev/pandas#16529 (comment)
This resolves a few warnings related to accessing things. Removing the `inplace=True` qualifier seems undesirable despite the warning being emitted for it, but it turns out that generally when this is given pandas is creating a copy internally anyway. pandas seem to have a general desire to remove the `inplace=` options everywhere so that code is not mutating, but remains more immutable. See the following comment from years ago. pandas-dev/pandas#16529 (comment)
This resolves a few warnings related to accessing things. Removing the `inplace=True` qualifier seems undesirable despite the warning being emitted for it, but it turns out that generally when this is given pandas is creating a copy internally anyway. pandas seem to have a general desire to remove the `inplace=` options everywhere so that code is not mutating, but remains more immutable. See the following comment from years ago. pandas-dev/pandas#16529 (comment)
the inplace version of pandas.fillna apparently shouldn't be used see pandas-dev/pandas#16529
The parameter
inplace=False
should be deprecated across the board in preparation forpandas
2, which will not support that input (we will always return a copy). That would give people time to stop using it.Thoughts?
Methods using
inplace
:Deprecation non controvertial (a copy will be made anyway, and
inplace=True
does not add value):drop=False
wouldn't change the data, but that doesn't seem the main use case)Not sure:
Should be able to not copy memory (under discussion on what to do):
Special cases:
inplace=False
the value is not returned but set to an argumenttarget
)The text was updated successfully, but these errors were encountered: