Home

Introduction


Sorting involves arranging a collection of items in order

Three main cases to consider for input order:

When analysing sorting algorithms, we consider:

Properties


Properties:

Elementary Sorting Algorithms


Selection Sort


Method:

Time complexity:

Best case: $O(n^2)$
Average case: $O(n^2)$
Worst case: $O(n^2)$

Properties:

Unstable: due to long-range swaps
Non-adaptive: performs same steps, regardless of sortedness of original array
In-place: does not use temporary arrays

Bubble Sort


Method:

Time complexity:

Best case: $O(n)$ => When array is already sorted, only a single pass required
Average case: $O(n^2)$
Worst case: $O(n^2)$

Properties:

Stable: Swappings are between adjacent elements only
Adaptive: Bubble sort is $O(n^2)$ on average, $O(n)$ if input array is sorted
In-place: does not use temporary arrays

Insertion Sort


Method:

Time complexity:

Best case: $O(n)$ => When array is already sorted
Average case: $O(n^2)$
Worst case: $O(n^2)$

Properties:

Stable: Elements are always inserted to the right of any equal elements
Adaptive: Insertion sort is $O(n^2)$ on average, $O(n)$ if input array is sorted
In-place: does not use temporary arrays

Summary of Elementary Sorts


Elementary Sort Summary

Extra:

Notice that Bubble Sort and Insertion Sort has the exact same properties, so how do we differentiate them? We will test the unknown algorithm with an edge case of sorted elements, but the first and the last elements are swapped (e.g., [n, 2, 3, 4,…, n - 1, 1]). Using this case, bubble sort executes with an O(n^2) complexity, whereas insertion sort is O(n).

Divide-and-Conquer Sorting Algorithms


Merge Sort

A divide-and-conquer sorting algorithm:

Time complexity:

Best case: $O(n \log n)$
Average case: $O(n \log n)$
Worst case: $O(n \log n)$

Properties:

Stable: Due to taking from left subarray if items are equal during merge
Non-adaptive: Same time complexity for all cases
Not in-place: Merge uses a temporary array of size up to n, and uses $O(\log n)$ stack space.

Sorting a linked list with merge sort uses $O(1)$ extra memory (excluding stack space) since it does not need a temporary array, but simply just rewiring of pointers.

Quick Sort


Method:

  1. Choose an item to be a pivot
  2. Rearrange (partition) the array so that
  1. Recursively sort each of the partitions

Partitioning:

Partitioning is $O(n)$, where n is the number of elements being partitioned

Time complexity:

Best case: $O(n \log n)$ => Choice of pivot gives two equal-sized partitions
Average case: $O(n \log n)$
Worst case: $O(n^2)$ => Bad pivot choice resulting in partitions of size 0 and n − 1 & resulting in n recursive levels

Properties:

Unstable: Due to long-range swaps
Non-adaptive: $O(n \log n)$ average case, sorted input does not improve this
In-place: Partitioning is done in-place, and uses $O(\log n)$ stack space on average (O(n) worst case).

Median-of-Three Partitioning:

Take three values: left-most, middle, right-most.

Pick the median of these three values as our pivot.

Ordered data is no longer a worst-case scenario.

This does not eliminate worst-case of $O(n^2)$, but make it much less likely.

Median of 3 Quicksort

In this example, 5 is chosen as the pivot.

Randomised Partitioning

Pick a random value for the pivot

This makes it nearly impossible to have $O(n^2)$ performance

void randomisedQuickSort(Item items[], int lo, int hi) {
  if (lo >= hi) return;
  swap(items, lo, randint(lo, hi));
  int pivotIndex = partition(items, lo, hi);
  randomisedQuickSort(items, lo, pivotIndex - 1);
  randomisedQuickSort(items, pivotIndex + 1, hi);
}

int randint(int lo, int hi) {
  int i = rand() % (hi - lo + 1);
  return lo + i;
}

Note: rand() is a pseudo-random number generator provided by <stdlib.h>. The generator should be initialised with srand().

Insertion Sort Improvement

For small sequences (when n < 5, say), quick sort is expensive because of the recursion overhead.

Solution: Handle small partitions with insertion sort

#define THRESHOLD 5
void quickSort(Item items[], int lo, int hi) {
  if (hi - lo < THRESHOLD) {
    insertionSort(items, lo, hi);
    return;
  }

  medianOfThree(items, lo, hi);
  int pivotIndex = partition(items, lo, hi);
  quickSort(items, lo, pivotIndex - 1);
  quickSort(items, pivotIndex + 1, hi);
}

Summary of Divide-and-Conquer Sorts


Design of modern cpus mean, for sorting arrays in ram quick sort generally outperforms merge sort

Quick sort is more ‘cache friendly’, good locality of access on arrays

On the other hand, merge sort is readily stable, readily parallel, a good choice for sorting linked lists

Divide and Conquer Summary

Non-Comparison-Based Sorting Algorithms


Radix Sort


Radix sort requires us to be able to decompose our keys into individual symbols (digits, characters, bits, etc.), for example:

The number of possible symbols is known as the radix, and is denoted by R.

If the keys have different lengths, pad them with a suitable symbol, for example:

Method:

Example: Radix sort example

Analysis:

Time complexity (best = average = worst) for radix sort is $O(m(n + R))$.

Radix sort performs better than comparison-based sorting algorithms:

Stable: All sub-sorts performed are stable
Non-adaptive: Same steps performed, regardless of sortedness
Not in-place: Uses $O(R + n)$ additional space for buckets and storing keys in buckets