Principle, python implementation and efficiency test of top ten sorting algorithms

Algorithm classification

Note: most of the principles are carried from the rookie tutorial
Ten common sorting algorithms can be divided into two categories

  • Comparison sort: the order between elements is determined by comparison. Because its time complexity cannot exceed O(nlogn), it is also called nonlinear time comparison sort.
  • Non comparison sort: the order between elements is not determined by comparison. It can break through the time lower bound based on comparison sort and run in linear time. Therefore, it is also called linear time non comparison sort.

Algorithm complexity

Algorithm details

Bubble sorting

Compare the size of adjacent positions and exchange positions. After one traversal, it can ensure that the final value is the maximum or minimum. It is simple and intuitive. It is suitable for small-scale data

Algorithm steps
  1. Traverse from front to back, except for the last one
  2. Compare adjacent elements and swap if the first is larger than the second
  3. After traversal, the last element is the largest data
  4. Repeat 1-3 until sorting is complete
Dynamic diagram demonstration

code implementation
def bubble_sort(arr):
    for i in range(len(arr)):
        for j in range(0, len(arr)-1-i):
            if arr[j] > arr [j+1]:  # The first is bigger than the second
                arr[j], arr[j+1] = arr[j+1], arr[j]  # exchange
    return arr

Select sort

Using quadratic traversal to select the minimum value at the current position is simple and intuitive, but the complexity of the algorithm is O(n) ²), It is inefficient and only applies to small-scale data

Algorithm steps
  1. Traverse from front to back, and traverse twice from the current position
  2. Find the minimum value and put it in the current position
Dynamic diagram demonstration

code implementation
def select_sort(arr):
    for i in range(len(arr)):
        min_index = i  # Subscript of minimum value
        for j in range(i + 1, len(arr)):
            if arr[j] < arr[min_index]:
                min_index = j
        if (min_index != i):
            arr[i], arr[min_index] = arr[min_index], arr[i]  # Place the minimum value in the current position
    return arr

Insert sort

The working principle is to build an ordered sequence. For unordered data, scan from back to front in the sorted sequence, find the corresponding position and insert it

Algorithm steps
  • Starting from the first element, the element can be considered to have been sorted;
  • Take out the next element and scan from back to front in the sorted element sequence;
  • If the element (sorted) is larger than the new element, move the element to the next position;
  • Repeat step 3 until the sorted element is found to be less than or equal to the position of the new element;
  • After inserting the new element into this position;
  • Repeat steps 2 to 5
Dynamic diagram demonstration

code implementation
def insert_sort(arr):
    for i in range(1, len(arr)):
        for j in range(i, 0, -1):  # Traverse forward from current position
            if arr[j] < arr[j - 1]:  # If you encounter something bigger than yourself, exchange; if you encounter something smaller than yourself, the cycle ends
                arr[j], arr[j - 1] = arr[j - 1], arr[j]
            else:
                break
    return arr

Shell Sort

Shell invented the first sorting algorithm that breaks through O(n2) in 1959. It is an improved version of simple insertion sorting. The difference between it and insertion sorting is that it will preferentially compare elements far away. Hill sorting is also called reduced incremental sorting

Algorithm steps

First, divide the whole record sequence to be sorted into several subsequences for direct insertion sorting. The specific algorithm description is as follows:

  1. Select an incremental sequence t1, t2,..., tk, where ti > TJ, tk=1;

  2. Sort the sequence k times according to the number of incremental sequences k;

  3. For each sorting, the sequence to be sorted is divided into several subsequences with length m according to the corresponding increment ti, and each sub table is directly inserted and sorted. Only when the increment factor is 1, the whole sequence is treated as a table, and the table length is the length of the whole sequence.

Dynamic diagram demonstration

code implementation
def shell_sort(arr):
    gap = len(arr) // 2

    while gap > 0:
        for g in range(gap):
            for i in range(g + gap, len(arr), gap):
                for j in range(i, g, -gap):
                    if arr[j] < arr[j - gap]:
                        arr[j], arr[j - gap] = arr[j - gap], arr[j]
                    else:
                        break
        gap = gap // 2
    return arr

Merge sort

Merge sort is an effective sort algorithm based on merge operation. The algorithm adopts Divide and Conquer A very typical application of. Merge the ordered subsequences to obtain a completely ordered sequence; that is, first order each subsequence, and then order the subsequence segments. If two ordered tables are merged into one ordered table, it is called 2-way merging.

Algorithm steps
  1. The input sequence with length n is divided into two subsequences with length n/2;

  2. The two subsequences are sorted by merging;

  3. Merge two sorted subsequences into a final sorting sequence.

Dynamic diagram demonstration

code implementation
def merge_sort(arr):
    if len(arr) < 2:
        return arr
    middle = len(arr) // 2
    left = arr[0:middle]
    right = arr[middle:]
    return merge(merge_sort(left), merge_sort(right))
 
 
def merge(left, right):
    result = []
    while left and right:
        if left[0] <= right[0]:
            result.append(left.pop(0))
        else:
            result.append(right.pop(0))
 
    while left:
        result.append(left.pop(0))
 
    while right:
        result.append(right.pop(0))
    return result

Quick sort

Is the divide and conquer version of the bubbling algorithm

Algorithm steps
  1. Pick a benchmark from the sequence
  2. The smaller ones are placed in front of the benchmark and the larger ones are placed behind the benchmark. This is called partition
  3. Recursively partition the left and right subsequences
Dynamic diagram demonstration

code implementation
def quick_sort(arr, left=None, right=None):
    left = 0 if left is None else left
    right = len(arr) - 1 if right is None else right
    if left < right:
        partition_index = partition(arr, left, right)
        quick_sort(arr, left, partition_index - 1)
        quick_sort(arr, partition_index + 1, right)
    return arr
 
 
def partition(arr, left, right):
    pivot = left
    index = pivot + 1
    i = index
    while i <= right:
        if arr[i] < arr[pivot]:
            swap(arr, i, index)
            index += 1
        i += 1
    swap(arr, pivot, index - 1)
    return index - 1

Heap sort

Algorithm steps
  1. Create heap with length of array
  2. Start from the bottom of the reactor and exchange the head and tail of the reactor to ensure the maximum value of the head of the reactor
  3. At this time, the value of the whole heap head is the largest, the data at the top of the heap and the data at the end of the heap are exchanged, and the heap size is reduced by 1
  4. Repeat steps 2 and 3 until the size of the stack is 1
Dynamic diagram demonstration

code implementation
def build_max_heap(arr):
    import math
    for i in range(math.floor(len(arr)/2),-1,-1):
        heapify(arr,i)

def heapify(arr, i):
    left = 2*i+1
    right = 2*i+2
    largest = i
    if left < arrLen and arr[left] > arr[largest]:
        largest = left
    if right < arrLen and arr[right] > arr[largest]:
        largest = right

    if largest != i:
        swap(arr, i, largest)
        heapify(arr, largest)

def swap(arr, i, j):
    arr[i], arr[j] = arr[j], arr[i]

def heap_sort(arr):
    global arrLen
    arrLen = len(arr)
    buildMaxHeap(arr)
    for i in range(len(arr)-1,0,-1):
        swap(arr,0,i)
        arrLen -=1
        heapify(arr, 0)
    return arr

Count sort

Applies when the array element is in a small interval and the value is not negative

Algorithm steps
  1. Find the maximum value k of the array and create a new array with length k+1
  2. Count the number of occurrences of each element with value i in the array to be sorted and store it in the first item of the new array
  3. Reverse fill target array
Dynamic diagram demonstration

code implementation
 def count_sort(arr, max_value):
    bucket = [0] * (max_value + 1)
    for num in arr:
        bucket[num]+=1
    index = 0
    for i in range(len(bucket)):
        if bucket[i]:
            for j in range(bucket[i]):
                arr[index] = i
                index += 1
    return arr

Bucket sorting

Bucket sorting is an upgraded version of counting sorting. Its efficiency depends on the distribution function of the bucket and the sorting method in the bucket

Algorithm steps
  1. Set a quantitative array as an empty bucket;

  2. Traverse the input data and put the data into the corresponding bucket one by one;

  3. Sort each bucket that is not empty;

  4. Splice the ordered data from a bucket that is not empty.

Dynamic diagram demonstration

Distribute elements into buckets

Elements are sorted in buckets

code implementation
 def bucket_sort(arr, min_value, max_value):
    BUCKET_SIZE = 3
    bucket_size = (max_value-min_value) // BUCKET_SIZE + 1
    buckets = [[] for i in range(bucket_size)]
    for i in range(len(arr)):
        buckets[(arr[i]-min_value)//BUCKET_SIZE].append(arr[i]) # uses the mapping function to allocate
    index = 0
    for bucket in buckets:  # Sort each bucket
        quick_sort(bucket)  # Quick sort is used here
        for num in bucket:
            arr[index] = num
            index += 1
    return arr

Cardinality sort

Cardinality sorting is to sort first according to the low order and then collect; then sort according to the high order and then collect; and so on until the highest order.

Algorithm steps
  1. Get the maximum number in the array and get the number of bits;

  2. arr is the original array, and each bit is taken from the lowest bit to form a radius array;

  3. Count and sort the radix (using the characteristics that count sorting is suitable for a small range of numbers);

Dynamic diagram demonstration

code implementation
def radix_sort(arr, max_digit):
    for digit in range(max_digit):
        buckets = [[] for i in range(10)]
        for i in range(len(arr)):
            buckets[int(arr[i] / (10 ** digit)) % 10].append(arr[i])
        index = 0
        for bucket in buckets:
            for num in bucket:
                arr[index] = num
                index += 1
    return arr 

Efficiency test

Test code

import copy
import time
import sys
import random

from sort.bubble_sort import bubble_sort
from sort.bucket_sort import bucket_sort
from sort.count_sort import count_sort
from sort.heap_sort import heap_sort
from sort.insert_sort import insert_sort
from sort.merge_sort import merge_sort
from sort.quick_sort import quick_sort
from sort.radix_sort import radix_sort
from sort.select_sort import select_sort
from sort.shell_sort import shell_sort

sys.setrecursionlimit(100000000)


def time_count(func):
    def wrapper(*args, **kwargs):
        start = time.clock()
        func(*args, **kwargs)
        end = time.clock()
        print(f'Time consuming:{end - start}second')

    return wrapper


class Executor(object):
    def __init__(self, func, func_name, *args, **kwargs):
        self.func = func
        self.func_name = func_name
        self.args = args
        self.kwargs = kwargs
        self.start()

    @time_count
    def start(self):
        print(self.func_name + 'Start execution')
        self.func(*self.args, **self.kwargs)



class TestCase:
    digit = 6

    def __init__(self):
        self.list = [random.randint(0, 10**self.digit-1) for i in range(10**self.digit)]
        print(f'test{10 ** self.digit}Data sorting')

    def test_bubble_sort(self):
        Executor(bubble_sort, 'Bubble sorting', copy.deepcopy(self.list))

    def test_select_sort(self):
        Executor(select_sort, 'Select sort', copy.deepcopy(self.list))

    def test_insert_sort(self):
        Executor(insert_sort, 'Insert sort', copy.deepcopy(self.list))

    def test_shell_sort(self):
        Executor(shell_sort, 'Shell Sort ', copy.deepcopy(self.list))

    def test_merge_sort(self):
        Executor(merge_sort, 'Merge sort', copy.deepcopy(self.list))

    def test_quick_sort(self):
        Executor(quick_sort, 'Quick sort', copy.deepcopy(self.list))

    def test_heap_sort(self):
        Executor(heap_sort, 'Heap sort', copy.deepcopy(self.list))

    def test_count_sort(self):
        Executor(count_sort, 'Count sort', copy.deepcopy(self.list), 10**self.digit)

    def test_bucket_sort(self):
        Executor(bucket_sort, 'Bucket sorting', copy.deepcopy(self.list), 0, 10**self.digit)

    def test_radix_sort(self):
        Executor(radix_sort, 'Cardinality sort', copy.deepcopy(self.list), self.digit)

    def main(self):
        # self.test_bubble_sort()
        # self.test_select_sort()
        # self.test_insert_sort()
        self.test_shell_sort()
        self.test_merge_sort()
        self.test_quick_sort()
        self.test_heap_sort()
        self.test_count_sort()
        self.test_bucket_sort()
        self.test_radix_sort()


if __name__ == '__main__':
    TestCase().main()

Results summary

The results measured by your own computer may have deviation. The following results are taken as the average of 5 tests

algorithm1000 (s)10000 (s)100000 (s)1000000 (s)
Bubble sorting0.078.18slightlyslightly
Select sort0.064.14slightlyslightly
Insert sort0.031.62slightlyslightly
Shell Sort 0.010.030.538.30
Merge sort0.010.061.6815.12
Quick sort0.050.070.859.61
Heap sort0.010.111.3917.18
Count sort0.010.010.050.46
Bucket sorting0.010.020.151.79
Cardinality sort0.010.050.556.39

summary

When the amount of data is large, if the spatial complexity is not considered, the individual recommends cardinal sorting, which takes up much less space and is faster than counting sorting; if the spatial complexity is considered, the individual recommends Hill sorting and quick sorting.

Tags: Python Algorithm

Posted on Mon, 25 Oct 2021 20:37:10 -0400 by KI114