# Principle, python implementation and efficiency test of top ten sorting algorithms

### Algorithm classification

Note: most of the principles are carried from the rookie tutorial
Ten common sorting algorithms can be divided into two categories

• Comparison sort: the order between elements is determined by comparison. Because its time complexity cannot exceed O(nlogn), it is also called nonlinear time comparison sort.
• Non comparison sort: the order between elements is not determined by comparison. It can break through the time lower bound based on comparison sort and run in linear time. Therefore, it is also called linear time non comparison sort.

### Algorithm details

#### Bubble sorting

Compare the size of adjacent positions and exchange positions. After one traversal, it can ensure that the final value is the maximum or minimum. It is simple and intuitive. It is suitable for small-scale data

##### Algorithm steps
1. Traverse from front to back, except for the last one
2. Compare adjacent elements and swap if the first is larger than the second
3. After traversal, the last element is the largest data
4. Repeat 1-3 until sorting is complete

##### code implementation
```def bubble_sort(arr):
for i in range(len(arr)):
for j in range(0, len(arr)-1-i):
if arr[j] > arr [j+1]:  # The first is bigger than the second
arr[j], arr[j+1] = arr[j+1], arr[j]  # exchange
return arr
```

#### Select sort

Using quadratic traversal to select the minimum value at the current position is simple and intuitive, but the complexity of the algorithm is O(n) ²), It is inefficient and only applies to small-scale data

##### Algorithm steps
1. Traverse from front to back, and traverse twice from the current position
2. Find the minimum value and put it in the current position

##### code implementation
```def select_sort(arr):
for i in range(len(arr)):
min_index = i  # Subscript of minimum value
for j in range(i + 1, len(arr)):
if arr[j] < arr[min_index]:
min_index = j
if (min_index != i):
arr[i], arr[min_index] = arr[min_index], arr[i]  # Place the minimum value in the current position
return arr
```

#### Insert sort

The working principle is to build an ordered sequence. For unordered data, scan from back to front in the sorted sequence, find the corresponding position and insert it

##### Algorithm steps
• Starting from the first element, the element can be considered to have been sorted;
• Take out the next element and scan from back to front in the sorted element sequence;
• If the element (sorted) is larger than the new element, move the element to the next position;
• Repeat step 3 until the sorted element is found to be less than or equal to the position of the new element;
• After inserting the new element into this position;
• Repeat steps 2 to 5

##### code implementation
```def insert_sort(arr):
for i in range(1, len(arr)):
for j in range(i, 0, -1):  # Traverse forward from current position
if arr[j] < arr[j - 1]:  # If you encounter something bigger than yourself, exchange; if you encounter something smaller than yourself, the cycle ends
arr[j], arr[j - 1] = arr[j - 1], arr[j]
else:
break
return arr
```

#### Shell Sort

Shell invented the first sorting algorithm that breaks through O(n2) in 1959. It is an improved version of simple insertion sorting. The difference between it and insertion sorting is that it will preferentially compare elements far away. Hill sorting is also called reduced incremental sorting

##### Algorithm steps

First, divide the whole record sequence to be sorted into several subsequences for direct insertion sorting. The specific algorithm description is as follows:

1. Select an incremental sequence t1, t2,..., tk, where ti > TJ, tk=1;

2. Sort the sequence k times according to the number of incremental sequences k;

3. For each sorting, the sequence to be sorted is divided into several subsequences with length m according to the corresponding increment ti, and each sub table is directly inserted and sorted. Only when the increment factor is 1, the whole sequence is treated as a table, and the table length is the length of the whole sequence.

##### code implementation
```def shell_sort(arr):
gap = len(arr) // 2

while gap > 0:
for g in range(gap):
for i in range(g + gap, len(arr), gap):
for j in range(i, g, -gap):
if arr[j] < arr[j - gap]:
arr[j], arr[j - gap] = arr[j - gap], arr[j]
else:
break
gap = gap // 2
return arr
```

#### Merge sort

Merge sort is an effective sort algorithm based on merge operation. The algorithm adopts Divide and Conquer A very typical application of. Merge the ordered subsequences to obtain a completely ordered sequence; that is, first order each subsequence, and then order the subsequence segments. If two ordered tables are merged into one ordered table, it is called 2-way merging.

##### Algorithm steps
1. The input sequence with length n is divided into two subsequences with length n/2;

2. The two subsequences are sorted by merging;

3. Merge two sorted subsequences into a final sorting sequence.

##### code implementation
```def merge_sort(arr):
if len(arr) < 2:
return arr
middle = len(arr) // 2
left = arr[0:middle]
right = arr[middle:]
return merge(merge_sort(left), merge_sort(right))

def merge(left, right):
result = []
while left and right:
if left[0] <= right[0]:
result.append(left.pop(0))
else:
result.append(right.pop(0))

while left:
result.append(left.pop(0))

while right:
result.append(right.pop(0))
return result
```

#### Quick sort

Is the divide and conquer version of the bubbling algorithm

##### Algorithm steps
1. Pick a benchmark from the sequence
2. The smaller ones are placed in front of the benchmark and the larger ones are placed behind the benchmark. This is called partition
3. Recursively partition the left and right subsequences

##### code implementation
```def quick_sort(arr, left=None, right=None):
left = 0 if left is None else left
right = len(arr) - 1 if right is None else right
if left < right:
partition_index = partition(arr, left, right)
quick_sort(arr, left, partition_index - 1)
quick_sort(arr, partition_index + 1, right)
return arr

def partition(arr, left, right):
pivot = left
index = pivot + 1
i = index
while i <= right:
if arr[i] < arr[pivot]:
swap(arr, i, index)
index += 1
i += 1
swap(arr, pivot, index - 1)
return index - 1
```

#### Heap sort

##### Algorithm steps
1. Create heap with length of array
2. Start from the bottom of the reactor and exchange the head and tail of the reactor to ensure the maximum value of the head of the reactor
3. At this time, the value of the whole heap head is the largest, the data at the top of the heap and the data at the end of the heap are exchanged, and the heap size is reduced by 1
4. Repeat steps 2 and 3 until the size of the stack is 1

##### code implementation
```def build_max_heap(arr):
import math
for i in range(math.floor(len(arr)/2),-1,-1):
heapify(arr,i)

def heapify(arr, i):
left = 2*i+1
right = 2*i+2
largest = i
if left < arrLen and arr[left] > arr[largest]:
largest = left
if right < arrLen and arr[right] > arr[largest]:
largest = right

if largest != i:
swap(arr, i, largest)
heapify(arr, largest)

def swap(arr, i, j):
arr[i], arr[j] = arr[j], arr[i]

def heap_sort(arr):
global arrLen
arrLen = len(arr)
buildMaxHeap(arr)
for i in range(len(arr)-1,0,-1):
swap(arr,0,i)
arrLen -=1
heapify(arr, 0)
return arr
```

#### Count sort

Applies when the array element is in a small interval and the value is not negative

##### Algorithm steps
1. Find the maximum value k of the array and create a new array with length k+1
2. Count the number of occurrences of each element with value i in the array to be sorted and store it in the first item of the new array
3. Reverse fill target array

##### code implementation
``` def count_sort(arr, max_value):
bucket = [0] * (max_value + 1)
for num in arr:
bucket[num]+=1
index = 0
for i in range(len(bucket)):
if bucket[i]:
for j in range(bucket[i]):
arr[index] = i
index += 1
return arr
```

#### Bucket sorting

Bucket sorting is an upgraded version of counting sorting. Its efficiency depends on the distribution function of the bucket and the sorting method in the bucket

##### Algorithm steps
1. Set a quantitative array as an empty bucket;

2. Traverse the input data and put the data into the corresponding bucket one by one;

3. Sort each bucket that is not empty;

4. Splice the ordered data from a bucket that is not empty.

##### Dynamic diagram demonstration

Distribute elements into buckets

Elements are sorted in buckets

##### code implementation
``` def bucket_sort(arr, min_value, max_value):
BUCKET_SIZE = 3
bucket_size = (max_value-min_value) // BUCKET_SIZE + 1
buckets = [[] for i in range(bucket_size)]
for i in range(len(arr)):
buckets[(arr[i]-min_value)//BUCKET_SIZE].append(arr[i]) # uses the mapping function to allocate
index = 0
for bucket in buckets:  # Sort each bucket
quick_sort(bucket)  # Quick sort is used here
for num in bucket:
arr[index] = num
index += 1
return arr
```

#### Cardinality sort

Cardinality sorting is to sort first according to the low order and then collect; then sort according to the high order and then collect; and so on until the highest order.

##### Algorithm steps
1. Get the maximum number in the array and get the number of bits;

2. arr is the original array, and each bit is taken from the lowest bit to form a radius array;

3. Count and sort the radix (using the characteristics that count sorting is suitable for a small range of numbers);

##### code implementation
```def radix_sort(arr, max_digit):
for digit in range(max_digit):
buckets = [[] for i in range(10)]
for i in range(len(arr)):
buckets[int(arr[i] / (10 ** digit)) % 10].append(arr[i])
index = 0
for bucket in buckets:
for num in bucket:
arr[index] = num
index += 1
return arr
```

### Efficiency test

#### Test code

```import copy
import time
import sys
import random

from sort.bubble_sort import bubble_sort
from sort.bucket_sort import bucket_sort
from sort.count_sort import count_sort
from sort.heap_sort import heap_sort
from sort.insert_sort import insert_sort
from sort.merge_sort import merge_sort
from sort.quick_sort import quick_sort
from sort.select_sort import select_sort
from sort.shell_sort import shell_sort

sys.setrecursionlimit(100000000)

def time_count(func):
def wrapper(*args, **kwargs):
start = time.clock()
func(*args, **kwargs)
end = time.clock()
print(f'Time consuming:{end - start}second')

return wrapper

class Executor(object):
def __init__(self, func, func_name, *args, **kwargs):
self.func = func
self.func_name = func_name
self.args = args
self.kwargs = kwargs
self.start()

@time_count
def start(self):
print(self.func_name + 'Start execution')
self.func(*self.args, **self.kwargs)

class TestCase:
digit = 6

def __init__(self):
self.list = [random.randint(0, 10**self.digit-1) for i in range(10**self.digit)]
print(f'test{10 ** self.digit}Data sorting')

def test_bubble_sort(self):
Executor(bubble_sort, 'Bubble sorting', copy.deepcopy(self.list))

def test_select_sort(self):
Executor(select_sort, 'Select sort', copy.deepcopy(self.list))

def test_insert_sort(self):
Executor(insert_sort, 'Insert sort', copy.deepcopy(self.list))

def test_shell_sort(self):
Executor(shell_sort, 'Shell Sort ', copy.deepcopy(self.list))

def test_merge_sort(self):
Executor(merge_sort, 'Merge sort', copy.deepcopy(self.list))

def test_quick_sort(self):
Executor(quick_sort, 'Quick sort', copy.deepcopy(self.list))

def test_heap_sort(self):
Executor(heap_sort, 'Heap sort', copy.deepcopy(self.list))

def test_count_sort(self):
Executor(count_sort, 'Count sort', copy.deepcopy(self.list), 10**self.digit)

def test_bucket_sort(self):
Executor(bucket_sort, 'Bucket sorting', copy.deepcopy(self.list), 0, 10**self.digit)

def main(self):
# self.test_bubble_sort()
# self.test_select_sort()
# self.test_insert_sort()
self.test_shell_sort()
self.test_merge_sort()
self.test_quick_sort()
self.test_heap_sort()
self.test_count_sort()
self.test_bucket_sort()

if __name__ == '__main__':
TestCase().main()

```

#### Results summary

The results measured by your own computer may have deviation. The following results are taken as the average of 5 tests

algorithm1000 (s)10000 (s)100000 (s)1000000 (s)
Bubble sorting0.078.18slightlyslightly
Select sort0.064.14slightlyslightly
Insert sort0.031.62slightlyslightly
Shell Sort 0.010.030.538.30
Merge sort0.010.061.6815.12
Quick sort0.050.070.859.61
Heap sort0.010.111.3917.18
Count sort0.010.010.050.46
Bucket sorting0.010.020.151.79
Cardinality sort0.010.050.556.39

#### summary

When the amount of data is large, if the spatial complexity is not considered, the individual recommends cardinal sorting, which takes up much less space and is faster than counting sorting; if the spatial complexity is considered, the individual recommends Hill sorting and quick sorting.

Tags: Python Algorithm

Posted on Mon, 25 Oct 2021 20:37:10 -0400 by KI114