Good cache reutilization: http://cs.stackexchange.com/questions/3/why-is-quicksort-better-than-other-sorting-algorithms-in-practice
Quick but imprecise view: GIF visualization.
Input array:
Pointer setup:
8 2 7 1 3 5 6 4
^ ^
i r
j
Visualization: we will separate things into groups:
Where:
-
$i$ is the first of the large side.
-
$j$ is the current one we are checking now.
-
$r$ is a recursive input of this step. It will move only between two steps. It starts at the rightmost element.
The rule is simple:
-
$A[j] > A[r]$?
- Yes: $j++$
- No: exchange $A[i]$ and $A[j]$, $j++$, $i++$
$8 > 4$? Yes. $j++$:
| 8 2 7 1 3 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
$2 > 4$? No. Exchange $A[i]$ and $A[j]$:
| 2 8 7 1 3 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
$i++$ and $j++$:
2 | 8 7 1 3 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
$7 > 4$? Yes. $j++$:
2 | 8 7 1 3 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
$1 > 4$? No. Exchange $A[i]$ and $A[j]$:
2 1 | 7 8 3 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
$3 > 4$? No. Exchange $A[i]$ and $A[j]$:
2 1 | 3 8 7 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
$i++$ and $j++$:
2 1 3 | 8 7 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
And so on:
2 1 3 | 8 7 5 6 | 4
| ^ ^ | ^
| i j | r
| |
small | large | pivot
And:
2 1 3 | 8 7 5 6 | 4
| ^ | ^
| i | r
| | j
small | large | pivot
Oops: $j$ reached $r$. This means we are done, just exchange $A[r]$ and $A[i]$:
2 1 3 4 | 7 5 6 8
^ |
i |
|
small | large
Note how we effectively split things into two sides: one larger than $4$, the other smaller or equal to it.
Intuitively, why does it work? At each step, if we find a small number we throw it to the left, and the small side increases, eating that number up.
What to do next? Recurse down twice:
- from $0$ to $i - 1$
- from $i + 1$ to $N$
2 1 3 | 4 | 7 5 6 8
^ ^ | | ^ ^
i r | | i r
j | | j
Showing only the left side:
2 1 3
^ ^
i r
j
$2 < 3$:
2 1 3
^ ^
j r
i
$1 < 3$:
2 1 3
^
r
i
j
Exchange $A[r]$ and $A[i]$ ($3$ by itself, doing nothing):
2 1 3
^
r
i
j
Recurse again:
2 1 | 3
^ ^ | ^
i r | i
j | j
| r
Left side:
2 1
^ ^
i r
j
$2 < 1$:
1 2
^
r
i
j
Done. Recurse again:
1 | 2
^ | ^
r | r
i | i
j | j
Left side:
1
^
r
i
j
Oops, $i = j = r$ and there is a single element. This is the base case. Do nothing.
## Pseudo code
void quicksort(int[] A, int left, int right)
if left < right
int middle = partition(array, left, right)
quicksort(array, left, middle - 1)
quicksort(array, middle + 1, right)
int partition(int[] A, int left, int right)
int i = left
int j = left
while j < right
if A[j] > A[right]
swap A[i], A[j]
i++
j++
You use `quicksort(A, 1, A.length())` to sort the entire array.
## Worst case
The worst case happens when the input is either sorted or reverse sorted.
Although very rare for random inputs, almost sorted inputs may be common for certain applications.
### Sorted input
Let's see how a sorted input yields worst time.
Initial input:
1 2 3 | 4
^ | ^
i | r
j |
$1 < 4$:
1 2 3 | 4
^ | ^
i | r
j |
$2 < 4$:
1 2 3 | 4
^ | ^
i | r
j |
$3 < 4$:
1 2 3 | 4
| ^
| r
| i
| j
Exchange, $4$ and $4$ and recurse.
Right side contains only $4$ so it is already sorted.
Left side gives:
1 2 | 3
^ | ^
i | r
j |
The recurrence relation is:
$$T(n) = T(n - 1) + n$$
TODO can this be analysed with the master theorem?
Let's use induction.
We will do:
- $n$ moves on the first run
- $n - 1$ moves on the second run
- ...
- $1$ move on the last run
Which gives:
$$n + (n - 1) + ... + 1 = n * (n + 1) / 2$$
moves, and therefore $O(n^2)$.
### Reverse sorted input
8 7 6 5 4 3 2 1
^ ^
i r
j
8 7 6 5 4 3 2 1
^ ^ ^
i j r
8 7 6 5 4 3 2 1
^ ^ ^
i j r
Fast forward:
8 7 6 5 4 3 2 1
^ ^
i r
j
1 7 6 5 4 3 2 8
^ ^
i r
j
There is no left side. Recurse on right:
7 6 5 4 3 2 8
^ ^
i r
j
7 6 5 4 3 2 8
^ ^
i r
j
7 6 5 4 3 2 8
^ ^
i r
j
Fast forward:
7 6 5 4 3 2 8
^
r
i
j
There is no right side. Recurse on left:
7 6 5 4 3 2
^
r
i
j
Aha, another inverse sorted input, but unlike the sorted case, it came after two rounds. Same analysis as before.
### Repeated elements
In case all, or many of the elements are the same, the input is already sorted (and reverse sorted as well), so the running time is $n^2$.
But the three-way version does it linearly, which makes this implementation very popular as this may be a common case.
### Proof that this is the worst case
TODO. We need to use some formal property that encodes that fact that after any round, we get one step closer to the sorted output. Then count how many steps could happen.
## Best case
TODO. Find a family that always gets cut into half. Then optimality follows because of the theorem that says that $O(N log(N))$ is the best possible for comparison sort.
## Average case
TODO.
## Variants
### Three-way
Sorted input may be a disproportionately large input case, so that a variant of quicksort was created to solve it efficiently.
The idea is simply to use an extra pointer when partitioning, so that the array will be split into three parts.
We then recurse quicksort on the left and right parts.
This problem has a famous artistic formulation:
TODO: I think the best case for this algorithm is $O(N)$ since if all elements are equal, a single pass is done.
### Tail recursion optimized
This reduces the extra memory from `O(N)` to `O(lg(N))`.
TODO implement. For this to work for the worst case, you must select the side with the fewest elements, and tail recurse the one with more elements.
void quicksort(int[] A, int left, int right)
if left < right
int middle = partition(array, left, right)
quicksort(array, left, middle - 1)
quicksort(array, middle + 1, right)
int partition(int[] A, int left, int right)
int i = left
int j = left
while j < right
if A[j] > A[right]
swap A[i], A[j]
i++
j++
### Dual pivot
Used in Java 7's `Arrays.sort`:
TODO
### Pivot in the middle
### Organ pipe input
It is possible to take the pivot in the middle, to avoid worst case on sorted input which may be very common.
But then this gives worst case for organ pipe inputs of the form `[1,2,3,4,5,4,3,2,1]`.