An optimized stable sort algorithm
dt-sort is a proof of concept c++ implementation of an optimized stable sort algorithm that I created last Christmas holiday.
In terms of number of comparisons as well as number of swaps it is a few percent better than std::stable_sort()
in both worst and average case. You can build and run counts
and see your self exact results.
In theory, pure dt-sort is O(n log n), in-place, stable sort that uses minimal number of swaps and (almost) minimal number of comparisons with space complexity O(1). However, pure dt-sort is possible only for small size input data.
In order to support any input data size we need to construct a hybrid algorithm like timsort or introsort
and then (if done well) it preserves good characteristics of outer sorting algorithm with slightly better comparison and swap counts.
Indeed, the dtsort*.h
files are not very pretty, and are quite long-ish.
It's because they are generated by other code (that it's not yet ready to be published).
I use macros to squash the generated code size.
How would you sort a vector of size 2? A single compare and then depending of result do a single swap if needed. Right? How about vector of size 3? You would probably ask 2 up to 3 comparison question and then apply of of six permutations in order to return sorted result, right? How about vector size 4, or more? It's getting harder to hand-write such code, but it's definitely possible for small sizes. That's the main idea of dt-sort. BTW, dt stands for decision tree.
The code size grows exponentially with input vector size, however, it does not imply that dt-sort cannot be practical.
If we take a look on modern sort implementations in most mainstream standard libraries, there is a very common
pattern of picking different sorting algorithms depending on input size (or it's chunks size i.e. in the merge sort).
So, if we can sort a small data (like 5 or 7 elements) extremely fast then we can still benefit from it in merge sort for any size.
See modified stl_algo.h
as an example. There is a dt_stable_sort()
using dt-sort under the hood with identical signature as original std::stable_sort()
. See counts.cc
for usage example.
https://youtu.be/BVd-rYIqSy8?t=104
You can try it your self:
sudo cpufreq-set -g performance # to make measurmenets more predictable
./build/Release/build.sh # build the code
# run some benchmarks
SIZE=100000 ./build/Release/src/benchmark --benchmark_out_format=csv --benchmark_out=build/Release/100000.csv
# count swaps and compares for 10000 vector size 10 times:
./build/Release/src/counts 10000 10
sudo cpufreq-set -g powersave # back to normal power save mode
I suggest to use llvm. The gcc takes way longer to compile the code.
$ ./build/Release/src/counts 10000 10 # run 10 times on random data size=10000
stable_sort compare: 127767
dt_stable_sort compare: 121910
stable_sort move: 157297
dt_stable_sort move: 135817
stable_sort clock: 6321
dt_stable_sort clock: 6770
compare ratio: 0.954 move ratio: 0.863
I don't know yet. There is a lot of cloud processing going on 24/7 in many data centers. Perhaps somebody will find the use case for it. If you do or even think you do, please let me know. :-) For example, depending on implementation details some object storage solutions are using a lot of stable sort all the time. Thee are cases when compare or swaps are very expensive. In such cases some tweaked dt-srot may help.
- integrate with
std::sort()
- integrate with parallel
std::sort()
andstd::stable_sort()
- better integration with
std::stable_sort()
- better benchmarks
- try few more tweaks
- polish and publish dt-sort generating code
- generage dt-sort for other languages
- TBD