-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: unified sorting #8239
Comments
-1. Although I'm not fan of this behaviour, IMO we're stuck with it*, this would break lots of code and there's no way to make a clean depreciation/migration path. +1 to cleaning up sort API, the other changes seem reasonable. *Perhaps we should name everything order ?? :S |
Regarding I tried to come up with a way to detect when users might expect the original behavior... couldn't find anything clean. The best one was probably: def sort( ..., inplace=None):
...
inplace = True if inplace is None else False
... Unfortunately it's still a pretty bad idea. Backward-compatible, sure, but it's really just kicking the can down the road. |
I just don't see the value proposition in this particular breakage, it will affect a lot of users, and you're not even "fixing" anything (i.e. fixing their buggy code) - you'd just be changing syntax. To quote @y-p:
I'd say I was on the liberal side of API breakages, but I don't see how this one can fly! |
The more unified the sort API becomes, the more glaring the inplace inconsistency will be. That said, I think the argument is stronger to have consistent behavior vs. consistent signatures. Such a change should wait for a major version bump. (wait, who said 1.0?) So keeping inplace=false for Series.sort means:
|
big if! Definitely such a change should be discussed in the ML, but I think it's a tough sell. I agree the inconsistency sucks, but practicality beats purity.... and this will (fairly) annoy a lot of people. I think if you're changing the API there needs to be some carrot cake rather than just stick (with this change I just see stick). I was being serious about using/preferring Edit: To me "sort" sounds inplace, whereas "order" is temporary arrangement. |
OK, that edit-note makes sense to me. I'll have a look at |
@patricktokeeffe see also #9816 |
I don't think this is equivalent. |
…andas-dev#8239 DEPR: remove of na_last from Series.order/Series.sort, xref pandas-dev#5231
originally #5190
xref #9816
xref #3942
This issue is for creating a unified API to Series & DataFrame sorting methods. Panels are not addressed (yet) but a unified API should be easy to extend to them. Related are #2094, #5190, #6847, #7121, #2615. As discussion proceeds, this post will be edited.
For reference, the 0.14.1 signatures are:
Proposed unified signature for
Series.sort
andDataFrame.sort
(except Series version retains current inplace=True):The
sort_index
signatures change too andsort_columns
is created:Proposed changes:
makemaybe, possibly in 1.0inplace=False
default (changesSeries.sort
)by
argument to accept column-name/list-of-column-names in first positioncolumns
keyword ofDataFrame.sort
, replaced withby
(df.sort signature would need to retain columns keyword until finally removed but it's not shown in proposal)columns
arg ofDataFrame.sort
allows tuples); use newlevel
argument insteadby
/axis
inDataFrame.sort_index
(see change 7)axis
is too so for the sake of working with dataframes, it gets first positionlevel
argument to accept integer/level-name/list-of-ints/list-of-level-names for sorting (multi)index by particular level(s)columns
arg ofDataFrame.sort
level
argument tosort_index
in first position so level(s) of multilevel index can be specified; this makessort_index
==sortlevel
(see change 8)sort_remaining
arg to handle multi-level indexesDataFrame.sort_columns
==sort(axis=1)
(see syntax below)Series.order
since change 1 makesSeries.sort
equivalent (?)inplace
,kind
, andna_position
arguments toSeries.sort_index
(to matchDataFrame.sort_index
);by
andaxis
args are not added since they don't make sense for seriesby
argument fromDataFrame.sort_index
since it makessort_index
equivalent tosort
sortlevel
since change 3b makessort_index
equivalentNotes:
sort
is still object-dependent: for series, sorts by values and for data frames, sorts by indexlevel
arg makessort_index
andsortlevel
equivalent. if sortlevel is retained:sortlevel
tosort_level
for naming conventionsSeries.sortlevel
should haveinplace
argument addedlevel
andsort_remaining
args tosort_index
so it's not equivalent tosort_level
(intentionally limiting sort_index seems like a bad idea though)level=None
forsort_columns
. probably not since level=None falls back to level=0 anywayby
andaxis
arguments should be ignored bySeries.sort
Syntax:
sort()
==sort(level=0)
==sort_index()
==sortlevel()
sort(['A','B'])
sort(level='spam')
==sort_index('spam')
==sortlevel('spam')
sort(['A','B'], level='spam')
level
controls here even though columns are specified so sort happens along row index named 'spam' first, then nested sort occurs using columns 'A' and 'B'sort(axis=1)
==sort(axis=1, level=0)
==sort_columns()
sort(['A','B'], axis=1)
==sort_columns(['A','B'])
sort(['A','B'], axis=1, level='spam')
==sort_columns(['A','B'], level='spam')
axis
controlslevel
so sort will be on columns named 'A' and 'B' in column index named 'spam'sort()
==order()
-- sorts on valueslevel
specified, sorts on index/named index/level of multi-index:sort(level=0)
==sort_index()
==sortlevel()
sort(level='spam')
==sort_index('spam')
==sortlevel('spam')
Comments welcome.
The text was updated successfully, but these errors were encountered: