Algorithm classification
Note: most of the principles are carried from the rookie tutorial
Ten common sorting algorithms can be divided into two categories
- Comparison sort: the order between elements is determined by comparison. Because its time complexity cannot exceed O(nlogn), it is also called nonlinear time comparison sort.
- Non comparison sort: the order between elements is not determined by comparison. It can break through the time lower bound based on comparison sort and run in linear time. Therefore, it is also called linear time non comparison sort.
Algorithm complexity
Algorithm details
Bubble sorting
Compare the size of adjacent positions and exchange positions. After one traversal, it can ensure that the final value is the maximum or minimum. It is simple and intuitive. It is suitable for small-scale data
Algorithm steps
- Traverse from front to back, except for the last one
- Compare adjacent elements and swap if the first is larger than the second
- After traversal, the last element is the largest data
- Repeat 1-3 until sorting is complete
Dynamic diagram demonstration
code implementation
def bubble_sort(arr): for i in range(len(arr)): for j in range(0, len(arr)-1-i): if arr[j] > arr [j+1]: # The first is bigger than the second arr[j], arr[j+1] = arr[j+1], arr[j] # exchange return arr
Select sort
Using quadratic traversal to select the minimum value at the current position is simple and intuitive, but the complexity of the algorithm is O(n) ²), It is inefficient and only applies to small-scale data
Algorithm steps
- Traverse from front to back, and traverse twice from the current position
- Find the minimum value and put it in the current position
Dynamic diagram demonstration
code implementation
def select_sort(arr): for i in range(len(arr)): min_index = i # Subscript of minimum value for j in range(i + 1, len(arr)): if arr[j] < arr[min_index]: min_index = j if (min_index != i): arr[i], arr[min_index] = arr[min_index], arr[i] # Place the minimum value in the current position return arr
Insert sort
The working principle is to build an ordered sequence. For unordered data, scan from back to front in the sorted sequence, find the corresponding position and insert it
Algorithm steps
- Starting from the first element, the element can be considered to have been sorted;
- Take out the next element and scan from back to front in the sorted element sequence;
- If the element (sorted) is larger than the new element, move the element to the next position;
- Repeat step 3 until the sorted element is found to be less than or equal to the position of the new element;
- After inserting the new element into this position;
- Repeat steps 2 to 5
Dynamic diagram demonstration
code implementation
def insert_sort(arr): for i in range(1, len(arr)): for j in range(i, 0, -1): # Traverse forward from current position if arr[j] < arr[j - 1]: # If you encounter something bigger than yourself, exchange; if you encounter something smaller than yourself, the cycle ends arr[j], arr[j - 1] = arr[j - 1], arr[j] else: break return arr
Shell Sort
Shell invented the first sorting algorithm that breaks through O(n2) in 1959. It is an improved version of simple insertion sorting. The difference between it and insertion sorting is that it will preferentially compare elements far away. Hill sorting is also called reduced incremental sorting
Algorithm steps
First, divide the whole record sequence to be sorted into several subsequences for direct insertion sorting. The specific algorithm description is as follows:
-
Select an incremental sequence t1, t2,..., tk, where ti > TJ, tk=1;
-
Sort the sequence k times according to the number of incremental sequences k;
-
For each sorting, the sequence to be sorted is divided into several subsequences with length m according to the corresponding increment ti, and each sub table is directly inserted and sorted. Only when the increment factor is 1, the whole sequence is treated as a table, and the table length is the length of the whole sequence.
Dynamic diagram demonstration
code implementation
def shell_sort(arr): gap = len(arr) // 2 while gap > 0: for g in range(gap): for i in range(g + gap, len(arr), gap): for j in range(i, g, -gap): if arr[j] < arr[j - gap]: arr[j], arr[j - gap] = arr[j - gap], arr[j] else: break gap = gap // 2 return arr
Merge sort
Merge sort is an effective sort algorithm based on merge operation. The algorithm adopts Divide and Conquer A very typical application of. Merge the ordered subsequences to obtain a completely ordered sequence; that is, first order each subsequence, and then order the subsequence segments. If two ordered tables are merged into one ordered table, it is called 2-way merging.
Algorithm steps
-
The input sequence with length n is divided into two subsequences with length n/2;
-
The two subsequences are sorted by merging;
-
Merge two sorted subsequences into a final sorting sequence.
Dynamic diagram demonstration
code implementation
def merge_sort(arr): if len(arr) < 2: return arr middle = len(arr) // 2 left = arr[0:middle] right = arr[middle:] return merge(merge_sort(left), merge_sort(right)) def merge(left, right): result = [] while left and right: if left[0] <= right[0]: result.append(left.pop(0)) else: result.append(right.pop(0)) while left: result.append(left.pop(0)) while right: result.append(right.pop(0)) return result
Quick sort
Is the divide and conquer version of the bubbling algorithm
Algorithm steps
- Pick a benchmark from the sequence
- The smaller ones are placed in front of the benchmark and the larger ones are placed behind the benchmark. This is called partition
- Recursively partition the left and right subsequences
Dynamic diagram demonstration
code implementation
def quick_sort(arr, left=None, right=None): left = 0 if left is None else left right = len(arr) - 1 if right is None else right if left < right: partition_index = partition(arr, left, right) quick_sort(arr, left, partition_index - 1) quick_sort(arr, partition_index + 1, right) return arr def partition(arr, left, right): pivot = left index = pivot + 1 i = index while i <= right: if arr[i] < arr[pivot]: swap(arr, i, index) index += 1 i += 1 swap(arr, pivot, index - 1) return index - 1
Heap sort
Algorithm steps
- Create heap with length of array
- Start from the bottom of the reactor and exchange the head and tail of the reactor to ensure the maximum value of the head of the reactor
- At this time, the value of the whole heap head is the largest, the data at the top of the heap and the data at the end of the heap are exchanged, and the heap size is reduced by 1
- Repeat steps 2 and 3 until the size of the stack is 1
Dynamic diagram demonstration
code implementation
def build_max_heap(arr): import math for i in range(math.floor(len(arr)/2),-1,-1): heapify(arr,i) def heapify(arr, i): left = 2*i+1 right = 2*i+2 largest = i if left < arrLen and arr[left] > arr[largest]: largest = left if right < arrLen and arr[right] > arr[largest]: largest = right if largest != i: swap(arr, i, largest) heapify(arr, largest) def swap(arr, i, j): arr[i], arr[j] = arr[j], arr[i] def heap_sort(arr): global arrLen arrLen = len(arr) buildMaxHeap(arr) for i in range(len(arr)-1,0,-1): swap(arr,0,i) arrLen -=1 heapify(arr, 0) return arr
Count sort
Applies when the array element is in a small interval and the value is not negative
Algorithm steps
- Find the maximum value k of the array and create a new array with length k+1
- Count the number of occurrences of each element with value i in the array to be sorted and store it in the first item of the new array
- Reverse fill target array
Dynamic diagram demonstration
code implementation
def count_sort(arr, max_value): bucket = [0] * (max_value + 1) for num in arr: bucket[num]+=1 index = 0 for i in range(len(bucket)): if bucket[i]: for j in range(bucket[i]): arr[index] = i index += 1 return arr
Bucket sorting
Bucket sorting is an upgraded version of counting sorting. Its efficiency depends on the distribution function of the bucket and the sorting method in the bucket
Algorithm steps
-
Set a quantitative array as an empty bucket;
-
Traverse the input data and put the data into the corresponding bucket one by one;
-
Sort each bucket that is not empty;
-
Splice the ordered data from a bucket that is not empty.
Dynamic diagram demonstration
Distribute elements into buckets
Elements are sorted in buckets
code implementation
def bucket_sort(arr, min_value, max_value): BUCKET_SIZE = 3 bucket_size = (max_value-min_value) // BUCKET_SIZE + 1 buckets = [[] for i in range(bucket_size)] for i in range(len(arr)): buckets[(arr[i]-min_value)//BUCKET_SIZE].append(arr[i]) # uses the mapping function to allocate index = 0 for bucket in buckets: # Sort each bucket quick_sort(bucket) # Quick sort is used here for num in bucket: arr[index] = num index += 1 return arr
Cardinality sort
Cardinality sorting is to sort first according to the low order and then collect; then sort according to the high order and then collect; and so on until the highest order.
Algorithm steps
-
Get the maximum number in the array and get the number of bits;
-
arr is the original array, and each bit is taken from the lowest bit to form a radius array;
-
Count and sort the radix (using the characteristics that count sorting is suitable for a small range of numbers);
Dynamic diagram demonstration
code implementation
def radix_sort(arr, max_digit): for digit in range(max_digit): buckets = [[] for i in range(10)] for i in range(len(arr)): buckets[int(arr[i] / (10 ** digit)) % 10].append(arr[i]) index = 0 for bucket in buckets: for num in bucket: arr[index] = num index += 1 return arr
Efficiency test
Test code
import copy import time import sys import random from sort.bubble_sort import bubble_sort from sort.bucket_sort import bucket_sort from sort.count_sort import count_sort from sort.heap_sort import heap_sort from sort.insert_sort import insert_sort from sort.merge_sort import merge_sort from sort.quick_sort import quick_sort from sort.radix_sort import radix_sort from sort.select_sort import select_sort from sort.shell_sort import shell_sort sys.setrecursionlimit(100000000) def time_count(func): def wrapper(*args, **kwargs): start = time.clock() func(*args, **kwargs) end = time.clock() print(f'Time consuming:{end - start}second') return wrapper class Executor(object): def __init__(self, func, func_name, *args, **kwargs): self.func = func self.func_name = func_name self.args = args self.kwargs = kwargs self.start() @time_count def start(self): print(self.func_name + 'Start execution') self.func(*self.args, **self.kwargs) class TestCase: digit = 6 def __init__(self): self.list = [random.randint(0, 10**self.digit-1) for i in range(10**self.digit)] print(f'test{10 ** self.digit}Data sorting') def test_bubble_sort(self): Executor(bubble_sort, 'Bubble sorting', copy.deepcopy(self.list)) def test_select_sort(self): Executor(select_sort, 'Select sort', copy.deepcopy(self.list)) def test_insert_sort(self): Executor(insert_sort, 'Insert sort', copy.deepcopy(self.list)) def test_shell_sort(self): Executor(shell_sort, 'Shell Sort ', copy.deepcopy(self.list)) def test_merge_sort(self): Executor(merge_sort, 'Merge sort', copy.deepcopy(self.list)) def test_quick_sort(self): Executor(quick_sort, 'Quick sort', copy.deepcopy(self.list)) def test_heap_sort(self): Executor(heap_sort, 'Heap sort', copy.deepcopy(self.list)) def test_count_sort(self): Executor(count_sort, 'Count sort', copy.deepcopy(self.list), 10**self.digit) def test_bucket_sort(self): Executor(bucket_sort, 'Bucket sorting', copy.deepcopy(self.list), 0, 10**self.digit) def test_radix_sort(self): Executor(radix_sort, 'Cardinality sort', copy.deepcopy(self.list), self.digit) def main(self): # self.test_bubble_sort() # self.test_select_sort() # self.test_insert_sort() self.test_shell_sort() self.test_merge_sort() self.test_quick_sort() self.test_heap_sort() self.test_count_sort() self.test_bucket_sort() self.test_radix_sort() if __name__ == '__main__': TestCase().main()
Results summary
The results measured by your own computer may have deviation. The following results are taken as the average of 5 tests
algorithm | 1000 (s) | 10000 (s) | 100000 (s) | 1000000 (s) |
---|---|---|---|---|
Bubble sorting | 0.07 | 8.18 | slightly | slightly |
Select sort | 0.06 | 4.14 | slightly | slightly |
Insert sort | 0.03 | 1.62 | slightly | slightly |
Shell Sort | 0.01 | 0.03 | 0.53 | 8.30 |
Merge sort | 0.01 | 0.06 | 1.68 | 15.12 |
Quick sort | 0.05 | 0.07 | 0.85 | 9.61 |
Heap sort | 0.01 | 0.11 | 1.39 | 17.18 |
Count sort | 0.01 | 0.01 | 0.05 | 0.46 |
Bucket sorting | 0.01 | 0.02 | 0.15 | 1.79 |
Cardinality sort | 0.01 | 0.05 | 0.55 | 6.39 |
summary
When the amount of data is large, if the spatial complexity is not considered, the individual recommends cardinal sorting, which takes up much less space and is faster than counting sorting; if the spatial complexity is considered, the individual recommends Hill sorting and quick sorting.