[data structure and algorithm] sorting

1. Insert sort

The records to be sorted are processed one by one. Each record is compared with the previously sorted subsequence and inserted into the correct position in the subsequence

code

template<class Elem>
void inssort(Elem A[],int n)
{
    for(int i = 1;i < n;i++)
        for(int j = i;j >= 1 && A[j] < A[j-1];j--)
            swap(A,j,j-1);
}

performance

  • Best: ascending. The time complexity is O(n)
  • Worst: descending. Time complexity is O(n^2)
  • Average: for each element, the first half of the element is larger than it. Time complexity is O(n^2)

If the data to be sorted has been "basically ordered", insert sorting can achieve performance close to O(n)

optimization

template<class Elem>
void inssort(Elem A[],int n)
{
    for(int i = 1;i < n;i++){
        int j = i;
        int tp = A[j];
        for(;j >= 1 && tp < A[j-1];j--)
            A[j] = A[j - 1];
        A[j] = tp;
    }
}

2. Bubble sorting

Compare adjacent elements from the bottom of the array to the top. If the lower element is smaller, it is exchanged; otherwise, the upper element continues to be compared upward. This process pushes the smallest element to the top of the array like a "bubble" at a time

code

template<class Elem>
void bubsort(Elem A[],int n)
{
    for(int i = 0;i < n - 1;i++)
        for(int j = n - 1;j > i;j--)
            if(A[j] < A[j-1])
                swap(A,j,j-1);
}

performance

Bubble sort is a relatively slow sort, and there is no good best case execution time. Usually, the time complexity is O(n^2)

optimization

Add a variable flag to record whether a cycle has been exchanged. If no exchange has occurred, it indicates that it has been orderly and can be ended in advance


3. Select Sorting

"Select" the record with the smallest i in the array for the ith time, and put the record in the ith position of the array. In other words, each time the smallest element is found in the unordered sequence, it is placed at the front of the unordered array

code

template<class Elem>
void selsort(Elem A[],int n)
{
    for(int i = 0;i < n - 1;i++){
        int lowindex = i;
        for(int j = i + 1;j < n;j++)
            if(A[j] < A[lowindex])
                lowindex = j;
        swap(A,i,lowindex);//n times exchange
    }
}

performance

No matter whether the array is ordered or not, when looking for the smallest element in the unordered sequence, you need to traverse the smallest sequence, so the time complexity is O(n^2)

optimization

Each time the inner layer finds a minimum value and a maximum value (initially the end of the array). The minimum value is exchanged with the element at the initial position of each process, and the maximum value is exchanged with the element at the end of each process. Such a loop can reduce the size of the array by 2, which is faster than the original scheme (reduce by 1)


4.shell sorting

shell sorting compares and exchanges between non adjacent elements. Using the best time cost characteristic of insertion sort, it attempts to turn the sequence to be sorted into basic order, and then use insertion sort to complete the sorting work

When executing each cycle, Shell sorting divides the sequence into unconnected subsequences, and makes the spacing of elements in each subsequence the same in the whole array. Each subsequence is sorted by insertion sorting. The increment of each cycle is 1 / 2 of the previous cycle, and the subsequence elements are twice that of the previous cycle

The last round will be a "normal" insertion sort (that is, an insertion sort of the sequence containing all elements)

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-wukkndii-1631529555951) (... / pic/al-sort-1.png)]

code

const int INCRGAP = 3;

template<class Elem>
void shellsort(Elem A[],int n)
{
    for(int incr = n / INCRGAP;incr > 0;incr /= INCRGAP){//Traverse all incremental sizes
        for(int i = 0;i < incr;i++){
            /*Insert sort the subsequence. When the increment is 1, the last insertion sort is performed on all elements*/
            for(int j = i + incr;j < n;j += incr){
                for(int k = j; k > i && A[k] < A[k - incr];k -= incr){
                    swap(A,k,k - incr);
                }
            }
        }
    }
}

performance

Selecting an appropriate increment sequence can make Shell sorting more effective than other sorting methods. Generally speaking, the increment is not much effective when it is divided by 2, but the effect is better when "increment is divided by 3"

When increment divided by 3 is selected to decrease, the average running time of Shell sorting is O(n^(1.5))


5. Quick sort

First, select an axis value. Elements smaller than the axis value are placed on the left side of the axis value in the array, and elements larger than the axis value are placed on the right side of the axis value in the array. This is called a partition of the array. Quick sort, and then perform similar operations on the left and right sub arrays of axis values

There are several ways to select axis values. The easiest way is to use the first or last element. However, if the input array is in positive or reverse order, all elements will be divided into one side of the axis value. The better method is to randomly select the axis value

code

template <class Elem>
int partition(Elem A[],int i,int j)
{
    //Here, the tail element is selected as the axis value, and the selection of axis value can be designed as a function
    //If the selected axis value is not a tail element, you also need to exchange the axis value with the tail element
    int pivot = A[j];
    int l = i - 1;
    for(int r = i;r < j;r++)
        if(A[r] <= pivot)
            swap(A,++l,r);
    swap(A,++l,j);//Swap the axis value from the end with the element at the + + l position
    return l;
}

template <class Elem>
void qsort(Elem A[],int i,int j)
{
    if(j <= i)  return;
    int p = partition<Elem>(A,i,j);
    qsort<Elem>(A,i,p - 1);
    qsort<Elem>(A,p + 1,j);
}

performance

  • Best case: O(nlogn)
  • Average: O(nlogn)
  • Worst case: each process divides all elements to the axis value side, O(n^2)

The average run time of quick sort is close to its best run time, not its worst run time. Quick sort is the best average performance of all internal sorting algorithms

optimization

  1. The most obvious improvement is the selection of axis value. If the axis value is selected properly, the elements can be evenly divided to both sides of the axis value each time:

    The middle of three random values. In order to reduce the delay generated by the random number generator, the first, middle and last three elements can be selected as random values

  2. When n is very small, the quick sort will be very slow. Therefore, when the subarray is less than a certain length (empirical value: 9), do nothing. At this point, the array is basically ordered, and finally, the postprocessing is done once and then the final processing is completed.


6. Merge and sort

A sequence is divided into two subsequences of equal length, each subsequence is sorted, and then they are combined into a sequence. The process of merging two subsequences is called merging

code

template<class Elem>
void mergesortcore(Elem A[],Elem temp[],int i,int j)
{
    if(i == j)  return;
    int mid = (i + j)/2;

    mergesortcore(A,temp,i,mid);
    mergesortcore(A,temp,mid + 1,j);

    /*Merge*/
    int i1 = i,i2 = mid + 1,curr = i;
    while(i1 <= mid && i2 <= j){
        if(A[i1] < A[i2])
            temp[curr++] = A[i1++];
        else
            temp[curr++] = A[i2++];
    }
    while(i1 <= mid)
        temp[curr++] = A[i1++];
    while(i2 <= j)
        temp[curr++] = A[i2++];
    for(curr = i;curr <= j;curr++)
        A[curr] = temp[curr];
}

template<class Elem>
void mergesort(Elem A[],int sz)
{
    Elem *temp = new Elem[sz]();
    int i = 0,j = sz - 1;
    mergesortcore(A,temp,i,j);
    delete [] temp;
}

performance

In logn layer recursion, each layer requires O(n) time cost, so the total time complexity is O(nlogn), which does not depend on the relative order of values in the array to be sorted. Therefore, it is the best, average and worst-case running time

Since an auxiliary array of the same size as the sorted array is required, the space cost is O(n)

optimization

Merge in place sort can merge without auxiliary array

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-JWrASMHB-1631529555962)(... / pic/al-sort-3.png)]

void reverse(int *arr,int n)
{
    int i = 0,j = n - 1;
    while(i < j)
        swap(arr[i++],arr[j--]);
}

void exchange(int *arr,int sz,int left)
{
    reverse(arr,left);//Flip left part
    reverse(arr + left,sz - left);//Flip the right part
    reverse(arr,sz);//Flip all
}

void merge(int *arr,int begin,int mid,int end)
{
    int i = begin,j = mid,k = end;
    while(i < j && j <= k){
        int right = 0;
        while(i < j && arr[i] <= arr[j])
            ++i;
        while(j <= k && arr[j] <= arr[i]){
            ++j;
            ++right;
        }
        exchange(arr + i,j - i,j - i - right);
        i += right;
    }
}

7. Heap sorting

Heap sorting first builds the largest heap based on the array, and then "deletes" the top element each time (moves the top element to the end). The final sequence is the sequence sorted from small to large

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-sPPup0HD-1631529555967)(... / pic/al-sort-2.png)]

code

Here, the heap build and delete functions in C++ STL are used directly

template <class Elem>
void heapsort(Elem A[],int n)
{
    Elem mval;
    int end = n;
    make_heap(A,A + end);
    for(int i = 0;i < n;i++){
        pop_heap(A,A + end);
        end--;
    }
}

If you cannot use an existing library function:

/********************************************
 * Insert element into heap
 *  hole: The location of the new element
 ********************************************/
template <class value>
void _push_heap(vector<value> &arr,int hole){
    value v = arr[hole];//Take out the new element and create a hole
    int parent = (hole - 1) / 2;
    //Create the maximum heap. If you create the minimum heap, replace it with arr [parent] > value
    while(hole > 0 && arr[parent] < v){
        arr[hole] = arr[parent];
        hole = parent;
        parent = (hole - 1) / 2;
    }
    arr[hole] = v;
}

/********************************************
 * Delete heap top element
 ********************************************/
template <class value>
void _pop_heap(vector<value> &arr,int sz)
{
    value v = arr[sz - 1];
    arr[sz - 1] = arr[0];
    --sz;
    int hole = 0;
    int child = 2 * (hole + 1); //Right child
    while(child < sz){
        if(arr[child] < arr[child - 1])
            --child;
        arr[hole] = arr[child];
        hole = child;
        child = 2 * (hole + 1);
    }
    if(child == sz){
        arr[hole] = arr[child - 1];
        hole = child - 1;
    }
    arr[hole] = v;
    _push_heap(arr,hole);
}

/********************************************
 * Build pile
 *  sz: Size after deleting the heap top element
 *  v:  The position occupied by the heap top element is the value of the original element
 ********************************************/
template <class value>
void _make_heap(vector<value> &arr)
{
    int sz = arr.size();
    int parent = (sz - 2) / 2;
    while(parent >= 0){
        int hole = parent;
        int child = 2 * (hole + 1); //Right child
        value v = arr[hole];
        while(child < sz){
            if(arr[child] < arr[child - 1])
                --child;
            arr[hole] = arr[child];
            hole = child;
            child = 2 * (hole + 1);
        }
        if(child == sz){
            arr[hole] = arr[child - 1];
            hole = child - 1;
        }
        arr[hole] = v;
        _push_heap(arr,hole);
        --parent;
    }
}

template <class value>
void heap_sort(vector<value> &arr)
{
    _make_heap(arr);
    for(int sz = arr.size();sz > 1;sz--)
        _pop_heap(arr,sz);
}

performance

It takes O(n) time complexity to build the heap according to the existing array, and O(logn) time complexity to delete the top element of the heap every time. Therefore, the total time overhead is O(n+nlogn), and the average time complexity is O(nlogn)

Note that building a heap based on existing elements is very fast. If you want to find the k-largest element in the array, you can use the time of O(n+klogn). If K is very small, the time overhead is close to O(n)


1. Multi channel merging

Multiway merging is the most commonly used algorithm for external sorting: decompose the original file into multiple parts that can be loaded into memory at one time, and transfer each part into memory to complete sorting. Then, merge and sort the sorted sub files

[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-IH5sIuVV-1631529555974)(... / PIC / Al kmerge. PNG)]

Selection of k

Assuming that there are m sub files in total and k sub files are merged each time, a total of M sub files need to be merged Times (scan the disk). To find the minimum (or maximum) value among the k elements, you need to compare k-1 times. If the total number of records is N, the time complexity is , because It increases with the increase of k, so the increase of comparison times will gradually offset the performance gain brought by "low scanning times". Therefore, the selection of k value mainly involves two problems:

  1. Each round of merging will write the results back to the disk. The smaller the k, the more data will be transmitted between the disk and memory. Increasing k can reduce the number of scans
  2. The smallest element among the k elements needs to be compared k-1 times. If K is larger, the number of comparisons will be greater

optimization

The following methods can be used to reduce the number of comparisons:

  1. Loser tree
  2. Heap: using an array of k elements, first read the smallest element in the K file into the array (and record which element is from which file), then build the smallest heap, delete the top element, and extract the next number from the source file of the top element, insert it into the heap, and then repeat the operation after adjustment. Although K files need to be traversed for the first time to get the minimum element, and it takes some time to build the heap, the subsequent operations can be completed quickly

Tags: Algorithm data structure

Posted on Mon, 13 Sep 2021 17:35:24 -0400 by kingdm