Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

[WIP] Add Sort(...) extension methods for Span<T> #26859

Closed
wants to merge 32 commits into from

Conversation

nietras
Copy link

@nietras nietras commented Feb 5, 2018

Implements https://github.com/dotnet/corefx/issues/15329

WIP: This is still very much a work-in-progress. Warts and all! :)

Goals

  • Port coreclr Array.Sort code without major changes to the algorithm.
    That is, it is still Introspective Sort using median-of-three quick sort and heap sort.
  • Performance on par with Array.Sort.
  • Define performance tests that can be used for future new algorithms exploration.

Non-Goals

  • No new algorithms such as say using histograms for byte or similar.
  • Generally no changes to the overall algorithm, the idea being
    for this PR to hopefully be accepted without too much controversy :)

Array.Sort

Array.Sort is implemented as both managed code and native code (for some primitives) in:

Base and Variant Differences

This PR is based mainly on the native implementation and the generic implementation.
In retrospective I believed the different variants of Array.Sort would sort identically, but
they do not. The fact that I started out with the native implementation and focused on
primitives and the specialization of these, appears to have been a mistake.

The fact is Array.Sort can yield different sorting results depending on the variant used
when also sorting items (also called values, since the items used for sorting are then keys).
Note that the sort is still correct, it is just that equal keys can have different results
for where the items are. I.e. here is an example that comes from a special test:

image

Minor Bug Fix

Minor bug fix for introspective depth limit, see dotnet/coreclr#16002

Code Structure

Code is currently, probably temporary, structured into multiple files
to allow for easier comparison of the different variants.

Consolidated (most likely will be removed):

Current:

  • SpanSortHelpers.Common.cs
    • A few commonly used constants, types and methods e.g. Swap.
  • SpanSortHelpers.Keys.cs
    • Entry points and dispatch types for single span sort.
  • SpanSortHelpers.Keys.IComparable.cs
    • IComparable variant
  • SpanSortHelpers.Keys.Specialized.cs
    • Specialized switch for fast primitive versions and a little code sharing.
  • SpanSortHelpers.Keys.TComparer.cs
    • TComparer variant, used for specialized too.
  • SpanSortHelpers.KeysValues.cs
    • Entry points and dispatch types for two span sort (e.g. with items/values).
  • SpanSortHelpers.KeysValues.IComparable.cs
    • IComparable variant
  • SpanSortHelpers.KeysValues.Specialized.cs
    • Specialized switch for fast primitive versions and a little code sharing.
  • SpanSortHelpers.KeysValues.TComparer.cs
    • TComparer variant, used for specialized too.

Primary development was done in my DotNetCross.Sorting repo:
https://github.com/DotNetCross/Sorting/
This was to get a better feedback loop and to use BenchmarkDotNet for benchmark testing
and disassembly.

NOTE: I understand that in corefx we might want to consolidate this into a single file,
but for now it is easier when comparing variants. Note also that the coreclr has
more variants than currently in this PR, there is for example a variant for Comparison<T>
in coreclr.

Changes

Many small changes have been made due to usings refs and Unsafe, but a couple notable changes are:

  • Sort3 add a specific implementation for sorting three, used both for finding pivot and when sorting exactly 3 elements.
    • This has currently been changed to use Sort2 three times like coreclr as otherwise,
      there would be differences for some same key cases. That I haven't had time to debug.
    • Why use a special Sort3? For expensive comparisons this can be a big improvement
      if keys are almost already sorted, since only 2 compares are needed instead of 3.
  • Remove unnecessary ifs on swaps, except for a few places where it is now explicit.
  • A few renames such as Sort2 instead of SwapIfGreater.
  • Comparer based variant uses a specific IDirectComparer allowing for the kind of
    specialization for basic types that coreclr does in native code.
    This started out as just a LessThan method but the problem is then
    whether bogus comparers should yield the same result as Array.Sort so I
    changed this to have more methods.

Benchmarks

Since Sort is an in-place operation it is "destructive" and benchmarking is done a bit different than normal.
The basic code for benchmarks is shown below, this uses https://github.com/dotnet/BenchmarkDotNet and was
done in the https://github.com/DotNetCross/Sorting/ git repo. Porting these to
the normal corefx performance tests is on the TODO list.

The benchmarks use a Filler to pre-fill a _filled array
with a pattern in slices of Length. Depending on the filler the full array will then be filled
with either repeating patterns of Length or fill the entire array with a given pattern.

This allows using this for testing sorts of different lengths, but measures the total time to do a number of sorts,
so it includes the overhead of slicing and the loop. And allows testing different patterns/sequnces.
The main point here is that this is used to compare relatively to the Array.Sort.

[GlobalSetup]
public void GlobalSetup()
{
    Filler.Fill(_filled, Length, _toValue);
}

[IterationSetup]
public void IterationSetup()
{
    Array.Copy(_filled, _work, _maxLength);
}

[Benchmark(Baseline = true)]
public void ArraySort()
{
    for (int i = 0; i <= _maxLength - Length; i += Length)
    {
        Array.Sort(_work, i, Length);
    }
}

[Benchmark]
public void SpanSort()
{
    for (int i = 0; i <= _maxLength - Length; i += Length)
    {
        new Span<TKey>(_work, i, Length).Sort();
    }
}

Results for this for specific commits will come in comments later.

Fillers

As noted there are different fillers. Som fill the entire array not caring about the slice length.
Others fill based on slice length. This is particularly important for MedianOfThreeKiller.

  • Constant one constant value, e. g. 42, note that this means a single reference type.
  • Decrementing full array filled with a decreasing sequence from array length - 1 to 0.
  • Incrementing full array filled with a increasing sequence from 0 to length - 1.
  • MedianOfThreeKiller each slice filled with a sequence as detailed in Introspective Sorting and Selection Algorithms. For longer slice lengths this will use heap sort.
  • Random full array filled with random values. Seeded so each run is the same.
  • PartialRandomShuffle full array filled with incrementing sequence.
    Random pairs are then swapped, as a ratio of length. In this case 10%,
    i.e. 10% pairs have been swapped randomly. Seeded so each run is the same.

For each different type, the int value is converted to the given type
e.g. using ToString("D9") for string.

These fillers are also used for test case generation. But are combined with other
sort case generators.

Difference to BinarySearch

BinarySearch only has an overload without comparer for when T : IComparable<T>. That is, there is no overload where the value searched for is not not generically constrained.

This is unlike Sort where there is no generic constraint on the key type. If we had dotnet/csharplang#905 this might not be that big an issue, but we might expect issues for some uses of BinarySearch.

Tests

The fundamental principle of the tests are they use Array.Sort for generating the expected output. The idea being span Sort should give exactly the same result.

Notes and TODOs

Biggest problem currently is that there are differences for some specific test cases:

  • NaN for float and double will not give the same result when sorting keys and values.
    • I have not had time to debug this. In span Sort I wanted to use the specialized sort path for these, but coreclr only does that when items/values are of the same type. This means coreclr does not do a NaNPrepass for floating point types when items are not of same type.
    • I removed this then, but results are still different.
  • BogusComparable tests fail since Array.Sort will throw on these for some lengths.
    • Tests need to be modified to handle this, and span Sort must detect this issue and throw.

There are a lot of other remaining TODOs e.g.:

  • Better separation between fast and slow (OuterLoop) tests, currently tests are slow, even the "fast" ones.
  • Port performance tests as per corefx standard.
    • Need to determine scope of these... a lot can be added.

Performance is currently on par or significantly better than coreclr for value types. As soon
as reference types are used performance is... well miserably. This probably reflects the fact that
my main focus was on optimizing for ints. I believe the issue here must be around how I have factored
the generic code.

  • Are the changes I have made acceptable?
  • Are there any issues with how I have factored the code into the different types and generic methods?
  • Any way to make reference type code faster?
    • Is there any way one can circumvent the canonical representation of generic types and methods when a type is a reference type? So we can avoid JIT_GenericHandleMethod or JIT_GenericHandleClass, which shows up during profiling? This is the reason for the IComparable variant of the code... but it is still slow.

CC: @ahsonkhan @jkotas

@nietras nietras changed the title Add Sort(...) extension methods for Span<T> [WIP] Add Sort(...) extension methods for Span<T> Feb 5, 2018
@nietras
Copy link
Author

nietras commented Feb 5, 2018

Initial set of benchmark runs from https://github.com/DotNetCross/Sorting/blob/c168e6ff6a10f2cb2bde911f719feacffcfaffa7/tests/DotNetCross.Sorting.Benchmarks/Program.cs can be found at https://github.com/DotNetCross/SortingBenchmarkResults/tree/master/180205_c168e6ff_i5-3475S_nietras

Note that this was run InProcess on a five year old Ivy Bridge with the following details:

BenchmarkDotNet=v0.10.12, OS=Windows 10 Redstone 3 [1709, Fall Creators Update] (10.0.16299.192)
Intel Core i5-3475S CPU 2.90GHz (Ivy Bridge), 1 CPU, 4 logical cores and 4 physical cores
Frequency=2840377 Hz, Resolution=352.0659 ns, Timer=TSC
.NET Core SDK=2.1.4
  [Host] : .NET Core 2.0.5 (Framework 4.6.26020.03), 64bit RyuJIT

Platform=X64  Runtime=Core  Toolchain=InProcessToolchain  
LaunchCount=1  RunStrategy=Monitoring  TargetCount=11  
UnrollFactor=1  WarmupCount=3  

As I mention value type performance is on par or better, but reference type performance is a lot slower.

@nietras
Copy link
Author

nietras commented Feb 9, 2018

I've added lots of new benchmarks and disassembling courtesy of fixes made by the BenchmarkDotNet guys, in https://github.com/DotNetCross/Sorting/tree/04add921f49e19a33ccb00d81145fd765f1ecb91 which can be found at https://github.com/DotNetCross/SortingBenchmarkResults/tree/master/180209_04add921_i5-3475S_nietras

This adds benchmarks of overloads and more types. Overall, the results are the same. For value types and value type comparers perf is good even great see SingleSortBench-report-github.md, observe how ClassComparableComparer is much worse than for Array, though. As soon as reference types are involved things start getting bad.

Just to be clear, I am waiting for feedback on this and why this is and whether or not I really have to add variants for each of the specific cases, what needs adding is variants based on TComparer, IComparer and Comparison it seems, if these can't be handled with decoration as I had hoped.

@iSazonov
Copy link
Contributor

There might still be some duplication of code with LINQ.

@nietras
Copy link
Author

nietras commented Jun 13, 2018

@stephentoub @KrzysztofCwalina as far as I am concerned this was done, until the "security" issue popped up (long after starting this):

Would you be able to modify it to be "safe" with bounds checks (for example, use the span indexer rather than the Unsafe class)?

As then you might as well simply refactor existing array code (pretty much search/replace and then change a few indexing stuff) to use span and use that for all "consumers" (e.g. Array, List, Span) with the span version being the common ground. That is, if the version I made here isn't useable after all.

As I noted, the existing Array implementation doesn't do bounds checking in it's native implementation either, which was the start for my Span<> version.

From my point of view this code could be used for all consumers, but I have to say that I am not sure I have much time to revisit this again, though.

@stephentoub
Copy link
Member

I have to say that I am not sure I have much time to revisit this again, though.

Ok, thank you for your efforts on this. Given the above, I'm going to close this PR, but hopefully whoever picks this up next will be able to build on your efforts / reuse your commits as is appropriate.

@iSazonov
Copy link
Contributor

iSazonov commented Jan 1, 2019

Would you be able to modify it to be "safe" with bounds checks (for example, use the span indexer rather than the Unsafe class)?

Last days I experimented with other code and made sure that bounds checks unacceptably slow down for such code. Bounds checks should be done only once at the top level for security. Now I believe that @nietras's code is a right direction.

@EamonNerbonne
Copy link

EamonNerbonne commented Jan 2, 2019

@iSazonov that's my experience too (I did some experiments with sorting spans here).

Might it be an idea to stick something like this into a separate package with the intent to include it in core in the future? Because I think it's a shame it's not available until everything is "perfect", and simultaneously, I think it's worrisome that once it's in, it's very hard to change, and that implies that choices made here may stick around even when they're not ideal in the long run. In particular, the idea that this version sorts the same way as array.sort in the face of buggy comparers or indeterminate orderings sounds unnecessarily restrictive to me, and it makes the code somewhat larger and slower.

@nietras
Copy link
Author

nietras commented Jan 2, 2019

into a separate package

@EamonNerbonne it was my intent to release https://github.com/DotNetCross/Sorting as a nuget package so it was at least available in some form, just never got around too it, since I wanted to allow the sorting in this package to diverge a little bit (still correct sorting) from corefx to make some use cases a lot faster. The PR to corefx was targeting sorting that was exactly the same as before. The corefx version is inconsistent across different types when keys are identical.

@nietras
Copy link
Author

nietras commented Jan 4, 2019

@stephentoub if any of the comments have changed your view on this PR let me know. I still believe the Span/ref based sort code can be used as the one and only sorting code in coreclr/corefx.
CC: @ahsonkhan

@stephentoub
Copy link
Member

if any of the comments have changed your view on this PR let me know

Which view are you referring to?

@nietras
Copy link
Author

nietras commented Jan 4, 2019

@stephentoub that the Sorting impl must use bounds checked indexing instead of ref based without bounds checking. Despite the old c++ impl does the same.

@stephentoub
Copy link
Member

stephentoub commented Jan 4, 2019

I didn't comment on bounds checking, did I? I've not actually reviewed this PR before to my knowledge.

@nietras
Copy link
Author

nietras commented Jan 4, 2019

@stephentoub sorry given you closed it I thought you were "in the know" of the reasons why this was dismissed. Can't remember who said it, but as far as I recall that was the reason for this PR not being accepted.

@nietras
Copy link
Author

nietras commented Jan 4, 2019

If bounds checking is not an issue and it's a simple. matter of reducing code duplication (e. g. use the span version everywhere) then I could try getting this done again with help/guidance around that.

@stephentoub
Copy link
Member

sorry given you closed it I thought you were "in the know" of the reasons why this was dismissed. Can't remember who said it, but as far as I recall that was the reason for this PR not being accepted.

I closed it because it was still a WIP and you noted that you weren't sure when you'd be able to spend more time pushing it forward.

My ideal would be that this not copy the array's sorting code but rather actually share the same implementation, living in corelib and changing the existing implementation to work on either arrays or spans rather than just arrays.

@nietras
Copy link
Author

nietras commented Jan 4, 2019

you weren't sure when you'd be able to spend more time pushing it forward

This was based on me doing a complete rewrite/start from scratch due to ask for bounds checking. Because then you might as well just have copied and search replaced T[] with Span<T>.

Given how much time I spent on doing the ref version, I didn't see the point to be frank 😆

@nietras
Copy link
Author

nietras commented Jan 4, 2019

I kept the WIP because no one said this was acceptable, I also wanted to get the span version in, before removing the old code, to do one thing at a time.

As I said I believe this PR can be used to consolidate all code to a single managed version, with perf being on par or in most cases faster. Give or take.

Question is what implementation is acceptable? What principles apply? Bounds checking everywhere? Consolidate all code in same PR?

@stephentoub
Copy link
Member

Question is what implementation is acceptable? What principles apply? Bounds checking everywhere? Consolidate all code in same PR?

I believe @jkotas commented originally on bounds checking, so he should comment, too, but from my perspective, the min bar would be:

  • Done in coreclr, where all of this lives
  • Shared implementation with array so as to avoid code duplication
  • Doesn't make array implementation less safe
  • Doesn't measurably regress array sorting performance

@nietras
Copy link
Author

nietras commented Jan 4, 2019

My ideal would be that this not copy the array's sorting code but rather actually share the same implementation, living in corelib and changing the existing implementation to work on either arrays or spans rather than just arrays.

This in my view requires the ref based approach I have done. Not sure how your ideal is different from what I'm suggesting and this PR reflects as a starting point.

@stephentoub
Copy link
Member

Not sure how your ideal is different from what I'm suggesting

As I noted, I've not reviewed the PR, but just on the surface, this PR is in corefx rather than coreclr, it's duplicating code rather than using the same exact implementation (I can't easily tell how much it's diverged from what array is currently using), and because of both of those, I can't easily determine the impact it actually has on the existing array sorting safety or performance.

My suggestion would be to start a new PR in coreclr and go from there.

@nietras
Copy link
Author

nietras commented Jan 4, 2019

Done in coreclr, where all of this lives

There was a separate PR for coreclr as suggested by @jkotas

Shared implementation with array so as to avoid code duplication

Yes, of course, one step at a time I would think 🙂

Doesn't make array implementation less safe

That's a hard question, current implementation uses native not bounds checked code for specialized sorts. Versus not in other code paths. Does this make the native version less safe?

I understand the importance of this but it does not answer whether bounds checking is required.

Doesn't measurably regress array sorting performance

I did my best, as this PR reflects, to show this was not the case.

@jkotas
Copy link
Member

jkotas commented Jan 4, 2019

@jkotas commented originally on bounds checking

The current managed array sorting code in CoreLib is safe with bounds checks. It is immune to buffer overruns by design.

The bounds checks are skipped in the special cased unmanaged code for primitive types only. That unmanaged code was was a factory for security bugs that turned into MSRC (https://blogs.technet.microsoft.com/msrc/) events.

I have said that the thousands lines of a new unsafe sorting code without boundschecks will need extra security scrutiny given the history.

@stephentoub
Copy link
Member

There was a separate PR for coreclr as suggested by @jkotas

Great, that's where the discussion should be, then. The only changes that should be needed in corefx are the additions to the reference assemblies and tests.

@nietras
Copy link
Author

nietras commented Jan 4, 2019

I can't easily tell how much it's diverged from what array is currently using

I understand, but the ref rewrite required a lot of changes, and I thought a single "version" for different "compares" was possible without drawbacks, it was not. The coreclr code is not it my view clear on the many subtle differences between the different versions, so if I have to adapt back to being like coreclr again would indeed be a lot of work.

I rearranged the code here to clearly show the different versions.

And if the bounds checking is still required, well then I still think my efforts were wasted, I'd have to say.

@nietras
Copy link
Author

nietras commented Jan 4, 2019

coreclr PR:

dotnet/coreclr#16986

Note though I wanted code to be acceptable here first because testing is much easier here than in coreclr. Perhaps things have changed.

Given @jkotas the bounds checking is still a question that needs a clear answer and path forward, how should the code look?

I understand the importance of security here, but you will have to decide which versions of the code is acceptable.

@EamonNerbonne
Copy link

Another possibility it to pick a few simpler sub-problems and avoid bounds-checking only on those.

E.g the insertion-sort finisher is a pretty good candidate for that.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants