Friday, February 2, 2018

Counting Sort



Suppose we are given a list of numbers each between 0 and 9 inclusive. In other words, its all single-digit non-negative integers. For example, we have the list: 3, 9, 7, 6, 6, 4, 8. The sorted output is: 3, 4, 6, 6, 7, 8, 9.
How do we sort this? We can use something like quicksort, yes, with a complexity of O(N log N). But can we do better? To put it more precisely, since the best you can get with any comparison-based sorting is O(N log N)  - see for example the explanation at https://www.geeksforgeeks.org/lower-bound-on-comparison-based-sorting-algorithms/ -, can we something non-comparison based for this relatively very simple data sequence?

Consider the sorted output again: 3, 4, 6, 6, 7, 8, 9.
Of course we know that 3 comes before 4, 4 before 6 and so on. The main problem it seems is to get the number of items that come before each of the number in the input sequence. For example, there is zero items before 3, a single item before 4, two items before 6, four before 7 and so on and so on..    If we know then cumulative number of each digit before a particular digit x, then we can compute the position of x in the output list.

To compute these cumulative numbers, we can first count the frequency of all numbers from 0 to 9, storing those values in an array as shown below:

You can see in the picture that number 3 occurs once, 9 once, '6' twice and so on. To compute the cumulative count for number 3, we need to count the total number of occurrence for the numbers from 0 to 2. In general, for a number x > 0, we need the sum of the frequency of the numbers from 0 to x-1. The easiest way to do this is via 2 loops. The first loop compute as follows:

    for i in range(9):
        count[i+1] += count[i]

The outcome is as shown below:

The second loop performs a shift, so that the cumulative count for a digit x does not include the number x itself. 

It might sound (and look) complicated, but all we have to do is the following:

    for i in range(9, 0, -1):
        count[i] = count[i-1]
    count[0] = 0

Once we have the array of accumulated counts - a count array -, the numbers you have in there will be the indices for the numbers in the input list. For each number, x, in the input list, look up the the value at index x in count. The value you get is the index for x in the output array. 

    for i in range(len(arr)):
        index = arr[i]
        sorted[ count[index] ] = arr[i]
        count[index] += 1

Note a certain subtlety: the code above increments the value in count, - count[index]++ - after reading it. This will ensure that if you get the same x value later in the input list, it will be in a different location from the earlier x.


The overall code looks as follows:

def countingSort(arr):

   # initialize
    n = len(arr)
    sorted = [0] * n
    count = [0] * 10

   # count frequency for each number
    for i in range(n):
        index = arr[i]
        count[index] += 1

   # accumulate the count
    for i in range(9):
        count[i+1] += count[i]
    
    # do the shift   
    for i in range(9, 0, -1):
        count[i] = count[i-1]
    count[0] = 0   
   
   # compute the final sorted list 
    for i in range(len(arr)):
        index = arr[i]
        sorted[ count[index] ] = arr[i]
        count[index] += 1

    return sorted




No comments:

Post a Comment