Should internal usages of sorting with numpy use `kind="stable"`? #53558

mroeschke · 2023-06-08T01:36:39Z

There are several places where we call np.sort/argsort/etc. internally, i.e. not cases where users can specify a sorting kind like in sort_values, and use the default unstable kind="quicksort"

In numpy 1.25, it appears that CPUs that can use AVX will have a modified quicksort and recently broke some tests xref #53548 in our numpy dev build where we were testing these unstable sorting results.

Is it worth transitioning to a stable sorting algorithm internally for consistency?

Alternatively we could dynamically transition to use a stable sorting algorithm if duplicate values are being sorted?

The text was updated successfully, but these errors were encountered:

WillAyd · 2023-06-14T19:06:30Z

Sorry I missed the first half of the call today where this was discussed; I see the result was an agreement to move to a stable sort. Do we know the performance implications of that though? Seems like it opens up the possibility of performance bottlenecks so would be hesitant to commit to that

mroeschke · 2023-06-14T19:24:34Z

Do we know the performance implications of that though?

Not definitely, but most the application of numpy sorting internally is sorting numerical factors as one part of multiple operations so it seems unlikely that it could be the bottleneck in the main operation.

Additionally during the call it seemed the consistency of results is worth the tradeoff of performance implications

Gabriel-p · 2024-12-07T14:08:42Z

I just spent a full day figuring out why Pandas was giving me different results for the same array and today I found I'd been burnt by this issue #39877.

I 100% support kind="stable" being the default. Anything else is entirely unintuitive

mroeschke added the Compat pandas objects compatability with Numpy or Python functions label Jun 8, 2023

mroeschke mentioned this issue Jun 8, 2023

CI/DEPS: Add xfail(strict=False) to related unstable sorting changes in Numpy 1.25 #53548

Merged

Charlie-XIAO mentioned this issue Jun 24, 2023

CLN: Make internal numpy sort and argsort use kind="stable" #53829

Closed

4 tasks

simonjayhawkins added the Needs Discussion Requires discussion from core team before further action label Feb 1, 2024

mroeschke mentioned this issue Feb 20, 2024

BUG: sort_values is sorting the index too when ignore_index=False #57531

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should internal usages of sorting with numpy use `kind="stable"`? #53558

Should internal usages of sorting with numpy use `kind="stable"`? #53558

mroeschke commented Jun 8, 2023 •

edited

Loading

WillAyd commented Jun 14, 2023

mroeschke commented Jun 14, 2023

Gabriel-p commented Dec 7, 2024

Should internal usages of sorting with numpy use kind="stable"? #53558

Should internal usages of sorting with numpy use kind="stable"? #53558

Comments

mroeschke commented Jun 8, 2023 • edited Loading

WillAyd commented Jun 14, 2023

mroeschke commented Jun 14, 2023

Gabriel-p commented Dec 7, 2024

Should internal usages of sorting with numpy use `kind="stable"`? #53558

Should internal usages of sorting with numpy use `kind="stable"`? #53558

mroeschke commented Jun 8, 2023 •

edited

Loading