Generally when we think about sorting elements in a list, we’re thinking about comparison sorting algorithms. Very well known algorithms of this kind are, for example, Quicksort or Merge Sort. So what do we mean when we say O(nlog)?
This type of notation (called big O notation) describes us the limit an algorithm can have, in terms of it’s input. We're normally interested either on their average or worst case.
For comparison sorting algorithms, there is a fundamental limit for it’s worst case scenario: O(nlog). Merge Sort provides us with this limit. You can check this example for sorting a list of integers. So the question remains: can we do better? It may seem that the question doesn't make any sense, provided that I just told that there is there’s an actual limit. But actually... We can!
First of all, we’re talking about comparison sorting algorithms. We have other types of sorting algorithms (for example, integer sorting algorithms). So how can we? The simple “lesson” of this post is: it depends on the problem.
These type of algorithms give us the tools for generic sorting, that can be applied to any kind of data, as long as we can compare two elements. That means that we can have some kind of ordering between them: we can say that one is bigger (or smaller, or equal) than the other. What we have to focus on is the kind of problem that we have at hand. It's characteristics might enable us to come up with something better than O(nlogn). An example is if we have almost sorted data.
Let’s demonstrate this with a well known practical example: the Dutch national flag problem, proposed by Edgser Dijkstra. The problem reads like this:
The flag of Netherlands consists of three colours: red, white and blue. Given balls of these three colours arranged randomly in a line (the actual number of balls does not matter), the task is to arrange them such that all balls of the same colour are together and their collective colour groups are in the correct order.
Let’s order them Red, White, Blue. If we use an algorithm like Merge Sort, we can guarantee that, in the worst case it will be O(nlogn). But, as you we’re expecting, we can do better. We actually do it in O(n). Yes, that’s right, linear time. Let’s see how we can do that, using Python.
# Compare function for the problemdef comp_function(a, b):if a == 'R':return -1elif a == 'B':return 1# Swap two elements of an arraydef swap(array, i, j):temp = array[i]array[i] = array[j]array[j] = tempdef dutch_national_flag(array, comp_function):start = 0end = len(array) - 1i = 0while i <= end:if comp_function(array[i], 'R') == -1:swap(array, i, start)start += 1i += 1elif comp_function(array[i], 'B') == 1:swap(array, i, end)end -=1else:i += 1
As we can see, we will go through the list of elements once. There are 2 “tricks” that we’re using here, to achieve O(n):
- we don’t care about the White elements (poor fellas). We only care about putting Red elements in the beginning and and Blue ones at the end;
- to achieve the previous point, we have an algorithm-specific compare function and we have to keep track of the indexes where we want to swap to (both at the start and end).
With this simple example, we can see that in real world problems, it’s specificity might lead us to find a more efficient solution. And so, we should be aware of this fact when considering sorting for a very specific situation.