From a1c4d855bc133758ef65102f32bdeff22fb6d0af Mon Sep 17 00:00:00 2001 From: Lilith Orion Hafner Date: Mon, 30 Jan 2023 09:58:44 -0600 Subject: [PATCH] Sorting documentation fixups for 1.9 (#48440) - Fix typos - Clarify that ! means mutation, not "in-place-ness". This should be backported because sort! is even less in place in 1.9 than it already was in 1.8. - Rewrite the section on default policy to reflect the new default policy - Move examples and extended description of previously default sorting algorithms out of sort.md and into their respective docstrings (still rendered in sort.md) Co-authored-by: Jeremie Knuesel --- base/sort.jl | 26 ++++++++++++++-- doc/src/base/sort.md | 70 ++++++++++---------------------------------- 2 files changed, 39 insertions(+), 57 deletions(-) diff --git a/base/sort.jl b/base/sort.jl index 985e0e8f597f3..b3dbaf9ac2d79 100644 --- a/base/sort.jl +++ b/base/sort.jl @@ -524,7 +524,7 @@ Base.size(v::WithoutMissingVector) = size(v.data) send_to_end!(f::Function, v::AbstractVector; [lo, hi]) Send every element of `v` for which `f` returns `true` to the end of the vector and return -the index of the last element which for which `f` returns `false`. +the index of the last element for which `f` returns `false`. `send_to_end!(f, v, lo, hi)` is equivalent to `send_to_end!(f, view(v, lo:hi))+lo-1` @@ -1242,7 +1242,7 @@ Otherwise, we dispatch to [`InsertionSort`](@ref) for inputs with `length <= 40` perform a presorted check ([`CheckSorted`](@ref)). We check for short inputs before performing the presorted check to avoid the overhead of the -check for small inputs. Because the alternate dispatch is to [`InseritonSort`](@ref) which +check for small inputs. Because the alternate dispatch is to [`InsertionSort`](@ref) which has efficient `O(n)` runtime on presorted inputs, the check is not necessary for small inputs. @@ -1891,6 +1891,26 @@ Characteristics: ignores case). * *in-place* in memory. * *divide-and-conquer*: sort strategy similar to [`MergeSort`](@ref). + + Note that `PartialQuickSort(k)` does not necessarily sort the whole array. For example, + +```jldoctest +julia> x = rand(100); + +julia> k = 50:100; + +julia> s1 = sort(x; alg=QuickSort); + +julia> s2 = sort(x; alg=PartialQuickSort(k)); + +julia> map(issorted, (s1, s2)) +(true, false) + +julia> map(x->issorted(x[k]), (s1, s2)) +(true, true) + +julia> s1[k] == s2[k] +true """ struct PartialQuickSort{T <: Union{Integer,OrdinalRange}} <: Algorithm k::T @@ -1927,6 +1947,8 @@ Characteristics: case). * *not in-place* in memory. * *divide-and-conquer* sort strategy. + * *good performance* for large collections but typically not quite as + fast as [`QuickSort`](@ref). """ const MergeSort = MergeSortAlg() diff --git a/doc/src/base/sort.md b/doc/src/base/sort.md index e93d9716b1487..41b7096391a04 100644 --- a/doc/src/base/sort.md +++ b/doc/src/base/sort.md @@ -21,7 +21,8 @@ julia> sort([2,3,1], rev=true) 1 ``` -To sort an array in-place, use the "bang" version of the sort function: +`sort` constructs a sorted copy leaving its input unchanged. Use the "bang" version of +the sort function to mutate an existing array: ```jldoctest julia> a = [2,3,1]; @@ -134,65 +135,23 @@ Base.Sort.partialsortperm! ## Sorting Algorithms -There are currently four sorting algorithms available in base Julia: +There are currently four sorting algorithms publicly available in base Julia: * [`InsertionSort`](@ref) * [`QuickSort`](@ref) * [`PartialQuickSort(k)`](@ref) * [`MergeSort`](@ref) -`InsertionSort` is an O(n²) stable sorting algorithm. It is efficient for very small `n`, -and is used internally by `QuickSort`. +By default, the `sort` family of functions uses stable sorting algorithms that are fast +on most inputs. The exact algorithm choice is an implementation detail to allow for +future performance improvements. Currently, a hybrid of `RadixSort`, `ScratchQuickSort`, +`InsertionSort`, and `CountingSort` is used based on input type, size, and composition. +Implementation details are subject to change but currently available in the extended help +of `??Base.DEFAULT_STABLE` and the docstrings of internal sorting algorithms listed there. -`QuickSort` is a very fast sorting algorithm with an average-case time complexity of -O(n log n). `QuickSort` is stable, i.e., elements considered equal will remain in the same -order. Notice that O(n²) is worst-case complexity, but it gets vanishingly unlikely as the -pivot selection is randomized. - -`PartialQuickSort(k::OrdinalRange)` is similar to `QuickSort`, but the output array is only -sorted in the range of `k`. For example: - -```jldoctest -julia> x = rand(1:500, 100); - -julia> k = 50:100; - -julia> s1 = sort(x; alg=QuickSort); - -julia> s2 = sort(x; alg=PartialQuickSort(k)); - -julia> map(issorted, (s1, s2)) -(true, false) - -julia> map(x->issorted(x[k]), (s1, s2)) -(true, true) - -julia> s1[k] == s2[k] -true -``` - -!!! compat "Julia 1.9" - The `QuickSort` and `PartialQuickSort` algorithms are stable since Julia 1.9. - -`MergeSort` is an O(n log n) stable sorting algorithm but is not in-place – it requires a temporary -array of half the size of the input array – and is typically not quite as fast as `QuickSort`. -It is the default algorithm for non-numeric data. - -The default sorting algorithms are chosen on the basis that they are fast and stable. -Usually, `QuickSort` is selected, but `InsertionSort` is preferred for small data. -You can also explicitly specify your preferred algorithm, e.g. -`sort!(v, alg=PartialQuickSort(10:20))`. - -The mechanism by which Julia picks default sorting algorithms is implemented via the -`Base.Sort.defalg` function. It allows a particular algorithm to be registered as the -default in all sorting functions for specific arrays. For example, here is the default -method from [`sort.jl`](https://github.com/JuliaLang/julia/blob/master/base/sort.jl): - -```julia -defalg(v::AbstractArray) = DEFAULT_STABLE -``` - -You may change the default behavior for specific types by defining new methods for `defalg`. +You can explicitly specify your preferred algorithm with the `alg` keyword +(e.g. `sort!(v, alg=PartialQuickSort(10:20))`) or reconfigure the default sorting algorithm +for custom types by adding a specialized method to the `Base.Sort.defalg` function. For example, [InlineStrings.jl](https://github.com/JuliaStrings/InlineStrings.jl/blob/v1.3.2/src/InlineStrings.jl#L903) defines the following method: ```julia @@ -200,8 +159,9 @@ Base.Sort.defalg(::AbstractArray{<:Union{SmallInlineStrings, Missing}}) = Inline ``` !!! compat "Julia 1.9" - The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed - to be stable since Julia 1.9. Previous versions had unstable edge cases when sorting numeric arrays. + The default sorting algorithm (returned by `Base.Sort.defalg`) is guaranteed to + be stable since Julia 1.9. Previous versions had unstable edge cases when + sorting numeric arrays. ## Alternate orderings