Skip to content
/ dtsort Public

An optimized stable sort algorithm

License

Unknown, MIT licenses found

Licenses found

Unknown
COPYING.txt
MIT
LICENSE-MIT.txt
Notifications You must be signed in to change notification settings

michals/dtsort

Repository files navigation

dt-sort

An optimized stable sort algorithm

What is it?

dt-sort is a proof of concept c++ implementation of an optimized stable sort algorithm that I created last Christmas holiday.

How does dt-sort perform?

In terms of number of comparisons as well as number of swaps it is a few percent better than std::stable_sort() in both worst and average case. You can build and run counts and see your self exact results. In theory, pure dt-sort is O(n log n), in-place, stable sort that uses minimal number of swaps and (almost) minimal number of comparisons with space complexity O(1). However, pure dt-sort is possible only for small size input data. In order to support any input data size we need to construct a hybrid algorithm like timsort or introsort and then (if done well) it preserves good characteristics of outer sorting algorithm with slightly better comparison and swap counts.

How dare you use goto and macros these days?!

Indeed, the dtsort*.h files are not very pretty, and are quite long-ish. It's because they are generated by other code (that it's not yet ready to be published). I use macros to squash the generated code size.

What's the idea? How does it work?

How would you sort a vector of size 2? A single compare and then depending of result do a single swap if needed. Right? How about vector of size 3? You would probably ask 2 up to 3 comparison question and then apply of of six permutations in order to return sorted result, right? How about vector size 4, or more? It's getting harder to hand-write such code, but it's definitely possible for small sizes. That's the main idea of dt-sort. BTW, dt stands for decision tree.

The code size grows exponentially with input vector size, however, it does not imply that dt-sort cannot be practical. If we take a look on modern sort implementations in most mainstream standard libraries, there is a very common pattern of picking different sorting algorithms depending on input size (or it's chunks size i.e. in the merge sort). So, if we can sort a small data (like 5 or 7 elements) extremely fast then we can still benefit from it in merge sort for any size. See modified stl_algo.h as an example. There is a dt_stable_sort() using dt-sort under the hood with identical signature as original std::stable_sort(). See counts.cc for usage example.

Ok, but why?

https://youtu.be/BVd-rYIqSy8?t=104

Show me the numbers

You can try it your self:

    sudo cpufreq-set -g performance  # to make measurmenets more predictable

    ./build/Release/build.sh  # build the code
    # run some benchmarks
    SIZE=100000 ./build/Release/src/benchmark --benchmark_out_format=csv --benchmark_out=build/Release/100000.csv
    # count swaps and compares for 10000 vector size 10 times:
    ./build/Release/src/counts 10000 10

    sudo cpufreq-set -g powersave  # back to normal power save mode

I suggest to use llvm. The gcc takes way longer to compile the code.

Some example results

$ ./build/Release/src/counts 10000 10  # run 10 times on random data size=10000
   stable_sort compare: 127767
dt_stable_sort compare: 121910
   stable_sort move: 157297
dt_stable_sort move: 135817
   stable_sort clock: 6321
dt_stable_sort clock: 6770
compare ratio: 0.954     	 move ratio: 0.863

Possible usage

I don't know yet. There is a lot of cloud processing going on 24/7 in many data centers. Perhaps somebody will find the use case for it. If you do or even think you do, please let me know. :-) For example, depending on implementation details some object storage solutions are using a lot of stable sort all the time. Thee are cases when compare or swaps are very expensive. In such cases some tweaked dt-srot may help.

To-do

  • integrate with std::sort()
  • integrate with parallel std::sort() and std::stable_sort()
  • better integration with std::stable_sort()
  • better benchmarks
  • try few more tweaks
  • polish and publish dt-sort generating code
  • generage dt-sort for other languages
  • TBD

About

An optimized stable sort algorithm

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
COPYING.txt
MIT
LICENSE-MIT.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published