-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add ignore_index for df.sort_values and series.sort_values #30402
Conversation
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -474,6 +474,7 @@ Other API changes | |||
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`) | |||
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`) | |||
- :meth:`Series.str.__iter__` was deprecated and will be removed in future releases (:issue:`28277`). | |||
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` to be able to get reset index after sorting (:issue:`30114`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` to be able to get reset index after sorting (:issue:`30114`) | |
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` to be able to reset index after sorting (:issue:`30114`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! will change!
pandas/core/series.py
Outdated
@@ -2855,6 +2858,9 @@ def _try_kind_sort(arr): | |||
|
|||
result = self._constructor(arr[sortedIdx], index=self.index[sortedIdx]) | |||
|
|||
if ignore_index: | |||
result = result.reset_index(drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for this to differ with the frame implementation? Should be the same in both cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry! I thought the other one was faster in the first place, and then later found basically no difference, and forgot to align them to keep consistency.
Changed! thanks! @WillAyd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh!! i recall why they are different, this new_data
here is BlockManager
while in series is not. I revert the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use result.index = ibase.default_index(len(indexer))
this is NOT a BlockManager here, but a series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, sorry for my bad English, i meant new_data
in frame.py
is BlockManager (should have used there
other than here
to be precise 😅 ), here it is Series
. @jreback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will see my comment in the PR about drop_duplicates. We do not want to use .reset_index directly at all. This causes yet another copy of the data which we can easily avoid here.
you have to be careful to avoid mutating things though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good, some comments.
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -474,6 +474,7 @@ Other API changes | |||
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`) | |||
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`) | |||
- :meth:`Series.str.__iter__` was deprecated and will be removed in future releases (:issue:`28277`). | |||
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` to be able to reset index after sorting (:issue:`30114`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverse the ordering a bit :meth:`DataFrame.sort_values` and :meth:`Series.sort_values`
have gained the ignore_index
......
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed and moved!
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -474,6 +474,7 @@ Other API changes | |||
Supplying anything else than ``how`` to ``**kwargs`` raised a ``TypeError`` previously (:issue:`29388`) | |||
- When testing pandas, the new minimum required version of pytest is 5.0.1 (:issue:`29664`) | |||
- :meth:`Series.str.__iter__` was deprecated and will be removed in future releases (:issue:`28277`). | |||
- ``ignore_index`` is added in :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` to be able to reset index after sorting (:issue:`30114`) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to other enhancements
pandas/core/series.py
Outdated
@@ -2855,6 +2858,9 @@ def _try_kind_sort(arr): | |||
|
|||
result = self._constructor(arr[sortedIdx], index=self.index[sortedIdx]) | |||
|
|||
if ignore_index: | |||
result = result.reset_index(drop=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use result.index = ibase.default_index(len(indexer))
this is NOT a BlockManager here, but a series
@@ -2820,7 +2825,7 @@ def _try_kind_sort(arr): | |||
return arr.argsort(kind="quicksort") | |||
|
|||
arr = self._values | |||
sortedIdx = np.empty(len(self), dtype=np.int32) | |||
sorted_index = np.empty(len(self), dtype=np.int32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal to this but I wonder why this is specified as np.int32
; might be a bug for input with more than 2 ** 32 - 1 rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy uses int32 for argsort so best can do (to be honest 2B is enough)
thanks @charlesdong1991 |
…ndexing-1row-df * upstream/master: (333 commits) CI: troubleshoot Web_and_Docs failing (pandas-dev#30534) WARN: Ignore NumbaPerformanceWarning in test suite (pandas-dev#30525) DEPR: camelCase in offsets, get_offset (pandas-dev#30340) PERF: implement scalar ops blockwise (pandas-dev#29853) DEPR: Remove Series.compress (pandas-dev#30514) ENH: Add numba engine for rolling apply (pandas-dev#30151) [ENH] Add to_markdown method (pandas-dev#30350) DEPR: Deprecate pandas.np module (pandas-dev#30386) ENH: Add ignore_index for df.drop_duplicates (pandas-dev#30405) BUG: The setting xrot=0 in DataFrame.hist() doesn't work with by and subplots pandas-dev#30288 (pandas-dev#30491) CI: Fix GBQ Tests (pandas-dev#30478) Bug groupby quantile listlike q and int columns (pandas-dev#30485) ENH: Add ignore_index for df.sort_values and series.sort_values (pandas-dev#30402) TYP: Typing hints in pandas/io/formats/{css,csvs}.py (pandas-dev#30398) BUG: raise on non-hashable Index name, closes pandas-dev#29069 (pandas-dev#30335) Replace "foo!r" to "repr(foo)" syntax pandas-dev#29886 (pandas-dev#30502) BUG: preserve EA dtype in transpose (pandas-dev#30091) BLD: add check to prevent tempita name error, clsoes pandas-dev#28836 (pandas-dev#30498) REF/TST: method-specific files for test_append (pandas-dev#30503) marked unused parameters (pandas-dev#30504) ...
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This is the first step for closing the PR #30114, I will come up with another PR for
drop_duplicates