Heap / fast / Hill / merge sort, who is faster?


antecedents and consequences

I haven't written the sorting method myself for a long time. Recently, I suddenly encountered a pile of questions. Then I had an idea. Who is the faster to try these O(nlogn) sorting algorithms I now master?

Sorting source code

Heap code

//Heap sort
// C++ Version
void sift_down(int arr[], int start, int end) { // Calculate the subscripts of parent and child nodes
    int parent = start; int child = parent * 2 + 1; while (child <= end) { // Only when the subscript of the child node is within the range can the comparison be made
// First, compare the size of the two child nodes and select the largest one
        if (child + 1 <= end && arr[child] < arr[child + 1]) child++;
// If the parent node is larger than the child node, it means that the adjustment is completed and the function will jump out directly
        if (arr[parent] >= arr[child]) return; else {
// Otherwise, the parent-child content is exchanged, and the child nodes are compared with the grandchildren
            swap(arr[parent], arr[child]); parent = child; child = parent * 2 + 1;
        }
    }
} 
//Reactor discharge inlet
void heap_sort(int arr[], int len) {
// Start sift down from the parent node of the last node to complete heap
    for (int i = (len - 1 - 1) / 2; i >= 0; i--)
        sift_down(arr, i, len - 1);
// First, exchange the first element with the previous element that has been arranged, and then readjust (the element before the element just adjusted) until sorting is completed
    for (int i = len - 1; i > 0; i--) {
        swap(arr[0], arr[i]); sift_down(arr, 0, i - 1);
    }
}

Quick code

//Quick sort
void qsort(int* nums, int l, int r) {
    if (l >= r)return;
    int tl = l, tr = r;
    int cmp = nums[(l + r) / 2];
    while (tl <= tr) {
        while (nums[tl] < cmp)tl++;
        while (nums[tr] > cmp)tr--;
        if (tl <= tr) {
            swap(nums[tl], nums[tr]);
            tl++;
            tr--;
        }
    }
    qsort(nums, l, tr);
    qsort(nums, tl, r);
}
//Quick exhaust inlet
void quickSort(int* nums, int len) {
    qsort(nums, 0, len - 1);
}

Hill sort code

//Hill sort --- scattered insertion sort
void shellSort(int *nums, int len) {
    for (int step = len / 2; step >= 1; step /= 2) {
        for (int i = step; i < len; i++) {
            int j = i, temp = nums[i];
            for (; j >= step && nums[j - step] > temp; j -= step) {
                nums[j] = nums[j - step];
            }
            nums[j] = temp;
        }
    }
}

Merge sort code

The old three steps go, separation + combination

//Complete processing of merging and sorting -- msort realizes splitting array, and merge realizes sorting of two ordered sequences
void merge(int *nums, int *temp, int l, int mid, int r) {
    int p = l;
    int lstart = l, lend = mid, rstart = mid + 1, rend = r;
    while (lstart <= lend && rstart <= rend) {
        if (nums[lstart] < nums[rstart]) {
            temp[p++] = nums[lstart];
            lstart++;
        } else {
            temp[p++] = nums[rstart];
            rstart++;
        }
    }
    while (lstart <= lend) {
        temp[p++] = nums[lstart];
        lstart++;
    }
    while (rstart <= rend) {
        temp[p++] = nums[rstart];
        rstart++;
    }
    for (int i = l; i < p; i++) {
        nums[i] = temp[i];
    }
}

//It is used to divide each ordered node interval
void msort(int *nums, int *temp, int l, int r) {
    if (l < r) {
        int mid = l + (r - l) / 2;
        msort(nums, temp, l, mid); //Left subtree
        msort(nums, temp, mid + 1, r); //Right subtree
        merge(nums, temp, l, mid, r);
    }
}
//Merge entrance
void mergeSort(int *nums, int numSize) {
    int *temp = (int *) calloc(numSize, sizeof(int));
    msort(nums, temp, 0, numSize - 1);
    free(temp);
}

testing environment

In order to ensure fairness, I use four different arrays to sort the same data through the copy function, and use the clock function for timing. The same print function is used to print the results. In order to facilitate timing, only the first ten elements are printed.

Efficiency test under 100w data volume

Test source code

#include <bits/stdc++.h>
using namespace std;
clock_t start, endtime;
#define N 1000000
void print(int *nums, int numSize) { //Print function
    for (int i = 0; i < numSize; i++) {
        cout << nums[i] << "--";
    }
    endtime = clock();
    printf("timeConsumer%dms\n", endtime - start);
}
void get_val(int *a, int len) {//Random number assignment
    for (int i = 0; i < len; i++) {
        a[i] = 100 + rand() % 10;
    }
}
int main() {
	//The following is the memory allocation
    int *a = new int[N];
    get_val(a, N);
    int *b = new int[N];
    int *c = new int[N];
    int *d = new int[N];
    copy(a, a + N, b);
    copy(a, a + N, c);
    copy(a, a + N, d);
    //The following is the sorting timing
    start = clock(); //Heap sort timing
    heap_sort(a, N);
    print(a, 10);
    start = clock(); //Merge sort timing
    mergeSort(b, N);
    print(b, 10);
    start = clock(); //Quick sort timing
    quickSort(c, N);
    print(c, 10);
    start = clock(); //Hill sort timing
    shellSort(d, N);
    print(d, 10);
}

The following five results are obtained:
I
II

III

IV

V

Quick row > Hill > merge > stack

Test under 1000w data

Test source code: Based on the previous source code, set N to 1000w
Test results:

Fast platoon > Hill > merge > stack platoon (gap widened)

Test of 100 million data

Test source code: change N to 100 million.

This test took a long time. I only tested it once. 😂 (waiting for more than a minute)

Fast row > Hill > merge > stack row (the gap is doubled)

summary

In the case of ordinary data, fast scheduling is really a research tool. Compared with the heap and row, it is not enough to see. After checking, the heap and row mainly lies in stability (not the stability of whether to exchange the same data) and will not fall to O(N^2). Well, I've written these sorting codes several times. I still think fast sorting is cost-effective, easy to write and fast! 😂

Imperial PK, personal fast platoon VS std::sort()

Paste my quick row again to avoid turning it over

//Quick sort
void qsort(int *nums, int l, int r) {
    if (l >= r)return;
    int tl = l, tr = r;
    int cmp = nums[(l + r) / 2];
    while (tl <= tr) {
        while (nums[tl] < cmp)tl++;
        while (nums[tr] > cmp)tr--;
        if (tl <= tr) {
            swap(nums[tl], nums[tr]);
            tl++;
            tr--;
        }
    }
    qsort(nums, l, tr);
    qsort(nums, tl, r);
}
void quickSort(int *nums, int len) {
    qsort(nums, 0, len - 1);
}

100 million test data

Test interface source code

int main() {
    int *a = new int[N];
    get_val(a, N);
    int *b = new int[N];
    copy(a,a+N,b);
    start = clock();//My fast timer
    quickSort(a,N);
    print(a,10);
    start = clock(); //Fast scheduling of stl library
    sort(b,b+N);
    print(b,10);
}

Test time comparison (Entertainment:

First round: I won 😂

Second round: I won 😂

The third round: no comparison. This is unfair to the STL library. After all, sort still needs to consider many situations. It has strong operability and supports various templates 😂

Tags: C Algorithm data structure

Posted on Mon, 20 Sep 2021 19:10:58 -0400 by jonshutt