Sorting has always been a popular subject in Computer Science. Back in 1945 Mr. John von Neumann came up with Merge Sort. It’s an efficient divide and conquer algorithm, and we’ll dive right into it.
The general flow of the algorithm comes as follows: (1) divide the list into n lists of size 1 (being n the size of the original list), (2) recursively merge them back together to produce one sorted list.
As usual, I always get things better with an example. Let’s transform this (kind of) abstract explanation into a concrete example. We’ll use a simple, yet descriptive, example for sorting lists of integers. Let’s start with (1).
def merge_sort(list):
if len(list) <= 1:
return list
return merge(merge_sort(list[:len(list)/2]), merge_sort(list[len(list)/2:]))
Here we have part 1. Basically what we’re doing here is to check at each step if our list is already of size 1 and ff it is we return it. If not, we split it in half, call merge_sort on each of them and call a function merge with both halves. How many times will this merge function be called? log n because we’re splitting the list in half at each time.
Next, phase number (2). We need to merge everything back together.
def merge(l1, l2):
result = []
i, j = 0, 0
while i < len(l1) and j < len(l2):
if l1[i] > l2[j]:
result.append(l2[j])
j += 1
else:
result.append(l1[i])
i += 1
result += l1[i:]
result += l2[j:]
return result
So, what’s going on here? We know that we start with lists of size 1. That means that at each step, each of the 2 lists will be sorted on it’s own. We just need to stitch them together. That means that we go through the lists (until we reach the end of at least one) and we get the smallest element at each step. When one of them ends, we just need to add the remaining elements of the other to the result.
We already know that this merge will be called log n times. But at each call merge does n comparisons because it needs to figure out where the all the elements fit together. So Merge Sort is a O(n log n) comparison sorting algorithm.