Detailed explanation of sorting algorithm and optimization

1. Opening remarks

If I want to buy an iPhone 13 promax mobile phone, I search Baidu

I found a lot of related items. I want to buy them cheaper, but I'm afraid I'll meet a liar. What should I do?
At this time, sorting is used. We can sort according to credit and price, list the goods that best meet our expectations, and finally find the merchant I am willing to buy

2. Basic concept and classification of sorting

Sorting is a problem we often face in our life. When doing exercises, students will arrange from short to high; When checking the attendance in class, the teacher will call the roll according to the student number; When the college entrance examination is admitted, it will be admitted in descending order according to the total score
Suppose that the sequence containing n records is {r1, r2, r3,..., rn}, and its corresponding keywords are {k1, k2, k3,..., kn}. It is necessary to determine an arrangement p1, p2, p3,..., pn of 1, 2,..., n so that its corresponding keywords meet the kp1 ≤ kp2,..., kpn (non decreasing or non increasing) relationship, even if the sequence becomes a sequence ordered by keywords {rp1, rp2, rp3,..., rpn} Such an operation is called sorting, which is to sort the whole data in ascending or descending order according to a given keyword

For example, when selecting excellent students with equal total scores, what should we do?

mysql> select student.name, course.name,score.score from student, course, score where student.id = score.student_id and score.course_id = course.id;

+-----------------+--------------------+-------+
| name            | name               | score |
+-----------------+--------------------+-------+
| Black Whirlwind Li Kui      | Java               |  70.5 |
| Black Whirlwind Li Kui      | Computer principle         |  98.5 |
| Black Whirlwind Li Kui      | Higher order mathematics           |  33.0 |
| Black Whirlwind Li Kui      | english               |  98.0 |
| The Grapes         | Java               |  60.0 |
| The Grapes         | Higher order mathematics           |  59.5 |
| Bai Suzhen          | Java               |  33.0 |
| Bai Suzhen          | Computer principle         |  68.0 |
| Bai Suzhen          | Higher order mathematics           |  99.0 |
| Xu Xian            | Java               |  67.0 |
| Xu Xian            | Computer principle         |  23.0 |
| Xu Xian            | Higher order mathematics           |  56.0 |
| Xu Xian            | english               |  72.0 |
| I don't want to graduate        | Java               |  81.0 |
| I don't want to graduate        | Higher order mathematics           |  37.0 |
| Speak in a normal way        | Chinese traditional culture       |  56.0 |
| Speak in a normal way        | language               |  43.0 |
| Speak in a normal way        | english               |  79.0 |
| tellme          | Chinese traditional culture       |  80.0 |
| tellme          | english               |  92.0 |
+-----------------+--------------------+-------+

After the total score is arranged, you can sort by secondary keywords

2.1 sorting stability

It is precisely because sorting is not only for the primary keyword, but also for multiple keywords. In the end, it can be transformed into the sorting of a single keyword. We mainly discuss the sorting of delayed keywords
However, there may be two or more records with the same keyword in the sorted record sequence, and the sorting result may not be unique

Suppose ki = kJ (1 < = I < = n, 1 < = J < = n), and in the sequence before sorting, ri is ahead of RJ (i.e. I < J). After sorting:
If ri is still ahead of rj, it is a stable sort
On the contrary, rj ahead of ri is not stable

2.2 inner sorting and outer sorting

According to whether all the records with sorting are placed in memory, the sorting methods are internal sorting and external sorting

Internal sorting: during the whole sorting process, all records to be sorted are placed in memory
External sorting: because there are too many records to be sorted, they cannot be placed in memory at the same time. The whole sorting process needs to exchange data between internal and external memory for many times

The performance of sorting algorithm is mainly affected in three aspects:

  1. Time performance: the inner row is mainly for comparison and movement. Therefore, the number of comparison and movement can be reduced
  2. Auxiliary space: in addition to the storage space occupied by sorting, other storage space required to execute the algorithm
  3. Algorithm complexity: refers to the complexity of the algorithm itself, not the time complexity of the algorithm

2.3 classification

Simple algorithm: bubbling, selection, arrangement
Improved algorithm: Hill, heap, fast, merge

3. Implementation of sorting algorithm

All sorts are ascending

3.1. Bubble sorting

Bubble sort is an exchange sort. Its basic idea is to compare the keywords of adjacent records in pairs. If they are in reverse order, they will be exchanged until there are no records in reverse order.

/*
Time complexity:
	Worst case: O(N^2) [1+2+3+...+(n-1)=n*(n-1)/2]
	Best: O(N) [the data is in positive order, N-1 comparisons are made, and there is no exchange]
	Average: O(N^2)

Space complexity: O(1)

Stability: stable

Data objects: arrays

[No, ordered] -- > find the most valuable element from the unordered area by pairwise comparison and put it at the front end of the order

Application scenario: generally not used
*/
class bubbleSort {
    void bubbleSort(int[] arr) {
        long before = System.currentTimeMillis();
      	// Repeat the comparison and exchange for all elements [two numbers need to be compared once, so the number of elements needs to be compared - once]
        for (int i = 0; i < arr.length - 1; ++i) {
          	// Set up a flag. When there is no exchange of elements in a sequence traversal, it proves that the sequence has been ordered. But this improvement doesn't do much to improve performance
            boolean flag = true;
          	// Do the same for each pair of adjacent elements, from the first pair at the beginning to the last pair at the end. After this step, the largest book in the array will "bubble" to the end of the array
            for (int j = 0; j < arr.length - i - 1; ++j) {
              	// Compare adjacent elements. If the first one is bigger than the second, exchange them
                if (arr[j] > arr[j + 1]) {
                    int tmp = arr[j];
                    arr[j] = arr[j + 1];
                    arr[j + 1] = tmp;
                    flag = false;
                }
            }
            if (flag) {
                break;
            }
        }
        long after = System.currentTimeMillis();
        System.out.println("BubbleSort time: " + (after - before));
    }
}

Single step diagram:

A schematic diagram after flag optimization is added:

3.2. Select Sorting

Bubble sorting is like people who like to fry stocks in the short term. They are always buying and selling to earn the price difference, but they operate frequently. Even if there are few mistakes, they make little profit due to the high handling fee and stamp duty; Choosing sorting is like rarely selling, observing the waiting time, decisively buying and selling, fewer transactions and rich final income. Therefore, when using it, the smaller the data size, the better. The only advantage may be that it does not occupy additional memory space

/*
Time complexity: 
	Worst case: O(N^2) [1+2+3...+(n-1)=n(n-1)/2, slightly better than bubbling]
	Best: O(N) [the data is in positive order, N-1 comparisons are made, and the exchange is 0 times]
	Average: O(N^2)

Space complexity: O(1)

Stability: unstable

Data object: linked list, array

[Unordered, ordered] -- > find a most valuable element in the unordered area and put it behind the ordered area. For arrays: more comparison, less exchange

Application scenario: generally not used
*/
class selectSort {
    void selectSort(int[] arr) {
        long before = System.currentTimeMillis();
      	// A total of N-1 comparisons are required
        for (int i = 0; i < arr.length - 1; ++i) {
          	// Each round needs to compare N-i times and find the subscript of the minimum value
            int minIndex = i;
            for (int j = i + 1; j < arr.length; ++j) {
                if (arr[minIndex] > arr[j]) {
                    minIndex = j;
                }
            }
          	// Exchange when the minimum value is found: let the small value be in the front and the large value be in ascending order
            if (minIndex != i) {
                int tmp = arr[minIndex];
                arr[minIndex] = arr[i];
                arr[i] = tmp;
            }
        }
        long after = System.currentTimeMillis();
        System.out.println("SelectSort time: " + (after - before));
    }
}

Single step exchange diagram:


3.3. Insert sort

Although the code implementation of insertion sorting is not as simple and rough as bubble sorting and selection sorting, its principle should be the easiest to understand, because anyone who has played poker should be able to understand it in seconds. Insertion sorting is the simplest and intuitive sorting algorithm. Its working principle is to build an ordered sequence for unordered data, Scan back and forward in the sorted sequence, find the corresponding position and insert


It should be said that even if you are playing poker for the first time, as long as you know these numbers, you don't need to teach the method of card management. Move 3 and 4 to the left of 5, and then move 2 to the far left. The order is sorted out. Here, our card management method is the direct insertion sorting method.

Like bubble sort, insert sort also has an optimization algorithm called split half insert

/*
Time complexity: 
	Worst case: O(N^2) [comparison times: 1+2+3+...(n-1)=n*(n-1)/2, movement times: (n-1)+(n-2)+(n-3)...+1 = n*(n-1)/2]
	Best: O(N) [data positive order, that is, arr [J] > TMP, that is, arr[i-1] and arr[i] are compared. Since each arr[i-1] < arr[i], there is no exchange, so it is O(N)]
	Average: O(N^2) [if the sorting records are random, the average comparison and movement times are about n^2/4 times according to the principle of the same probability]

Space complexity: O(1)

Stability: stable

Description: the faster the row, the same time complexity of O(N^2). Direct insertion sorting has better performance than bubbling and simple selection sorting

Data object: array, linked list

[Ordered area, unordered area] -- > insert the first element of the unordered area into the appropriate position of the ordered area. For arrays: less comparison, more exchange


Application scenario: when it is necessary to optimize part of the code or when the amount of data is small and the most stable, it is an algorithm with faster sorting speed
*/
class insertSort {
    void insertSort(int[] arr) {
        long before = System.currentTimeMillis();
      	// The first element of the first sequence to be sorted is regarded as an ordered sequence, and the second element to the last element is regarded as an unordered sequence
        for (int i = 1; i < arr.length; ++i) {
          	// Scan the unordered sequence from beginning to end, and insert each scanned element into the appropriate position of the ordered sequence [if the element to be inserted is equal to an element in the ordered sequence, insert the element to be inserted after the equal element]
            int tmp = arr[i];
            int j = i - 1;
            for (j = i - 1; j >= 0 && arr[j] > tmp; --j) {
                arr[j + 1] = arr[j];
            }
            arr[j + 1] = tmp;
        }
        long after = System.currentTimeMillis();
        System.out.println("InsertSOrt time:" + (after - before));
    }
}

Single step diagram:

  1. i=1, tmp=3, arr[0]=5, arr[0]>tmp;
  2. Assign arr[0] to arr[1]
  3. Then assign tmp to arr[0]
  4. 5 in position 3, 3 in position 5

Enter the next cycle

  1. i=2, tmp=4, arr[1]=5, arr[1]>tmp;
  2. Assign arr[1] to arr[2]
  3. Then assign tmp to arr[1]
  4. 5 in position 4, 4 in position 5


Enter the next cycle

  1. I = 3, TMP = 6, there is no element larger than 6 in arr, so it does not move

    Enter the next cycle
  2. i=4, all elements in the array are larger than tmp=2, so arr[j+1]=arr[j]; Will move them back one unit
  3. arr[j+1] = tmp fills 2 to the previous position of the subscript at the beginning of the movement of the last planting, that is, in front of 3

3.4. Hill sort

Hill sort, also known as decreasing incremental sort algorithm, is a more efficient improved version of insertion sort, but Hill sort is an unstable sort algorithm

Hill sort is an improved method based on the following two properties of insertion sort:

  • Insertion sorting is efficient when operating on almost ordered data, that is, it can achieve the efficiency of linear sorting
  • But insert sort is generally inefficient because insert sort can only move data one bit at a time

The basic idea of Hill sort is to divide the whole record sequence to be sorted into several subsequences for direct insertion sort respectively. When the records in the whole sequence are "basically orderly", all records are directly inserted and sorted in turn

3.4.1 Hill sorting principle

The above introduction roughly explains the characteristics of hill sorting, and these principles and characteristics are described in detail below

The previous insertion sort, although the time complexity is exactly n^2/4, But in some cases, the efficiency is very high: when the data itself is basically orderly, only a small number of insertion operations can achieve the sorting of the whole data. In addition, when the number of records is small, the advantage of direct insertion is particularly obvious. Therefore, Hill sorting is an insertion sorting after gradually reducing records and movements

How to reduce the number of records?
It's easy to think of grouping a large amount of data. It's a bit similar to packet forwarding of messages in the network. It can't be sent at one time, sent many times, and finally combined to form a complete data

Hill sorting is to divide the data into several subsequences. At this time, the amount of sorted data of each record is less, and then insert sorting is carried out in each subsequence respectively. When the whole sequence is basically orderly, finally, all divided data are integrated together for direct insertion sorting

be careful:
{9,1,8,3,7,4,6,2}. Now they are divided into three groups {9,1,5}, {8,3,7}, {4,6,2}. Even if they are arranged in order {1,5,9}, {3,7,8}, {2,4,6} are merged {1,5,9,3,7,8,2,4,6} at this time, they are also out of order, which is not basically orderly [9 in front and 2 in back]. The so-called basic order is: small keywords are basically in front, large keywords are basically in back, and no small keywords are in the middle. Like {2,1,3,6,4,7,5,8,9} This can be called basic order

3.4.2 implementation of hill sorting algorithm

/*
Time complexity:
	Worst case: O(log^2N) [n logN]
	Best: O(log^2N) [n logN(3/2) is logN, or logN]
	Average: O(logN) [n logN]

Space complexity: O(1)

Stability: unstable

Description: insertion sort is efficient when operating on almost ordered data, that is, it can achieve the efficiency of linear sorting. However, insertion sort is generally inefficient, because insertion sort can only move data one bit at a time

Data objects: arrays

Each round is inserted and sorted according to the gap determined in advance, and the interval will be reduced in turn. The last time must be 1
*/
class shellSort {
    void shellSort(int[] arr) {
        long before = System.currentTimeMillis();
        // Select an incremental sequence gap t1, t2,..., tk, where ti > TJ, tk = 1
        int increment = arr.length;
        do {
            increment = increment / 3 + 1;
            for (int i = increment; i < arr.length; ++i) {
                int tmp = arr[i];
                int j = i - increment;
                for (; j >= 0 && arr[j] > tmp; j -= increment) {
                    arr[j + increment] = arr[j];
                }
                arr[j + increment] = tmp;
            }
        } while (increment > 1);
        long after = System.currentTimeMillis();
        System.out.println("ShellSOrt time:" + (after - before));
    }
}


Suppose an array of {9,1,5,8,3,7,4,6,2} is passed in shellSort(arr)

The initial value of the increment factor is set to the number of records with sorting, that is, arr.length/3+1

  1. First cycle
subscript012345678
element915837462

increment value: 4
i value: 4
tmp value: 3
j value: 0

The value of arr[j] moves after comparing with tmp. Arr [0] > tmp moves just after the first element, so
Compare once, move once
Then fill tmp into arr[j+increment]

subscript012345678
element315897462
  1. Second cycle
subscript012345678
element315897462

increment value: 4
i value: 5
tmp value: 7
j value: 1
After comparing the value of arr[j] with tmp, it moves before the subscript 1 and 0 of the prime group, but there is no element larger than the tmp value, so the loop is not executed
Compare 2 times and move 0 times
Because j=i-increment
Therefore, arr[j+increment] = tmp, that is, arr[1+4] = 7

subscript012345678
element315897462
  1. Third cycle
subscript012345678
element315897462

increment value: 4
i value: 6
tmp value: 4
j value: 2
The value of arr[j] moves after comparing with tmp. Arr [2] > tmp, so
Compare once, move once
Then assign tmp to arr[j+increment] for filling

subscript012345678
element314897562
  1. Fourth cycle
subscript012345678
element314697582

increment value: 4
i value: 7
tmp value: 6
j value: 3
Arr [3] > TMP, so
Compare once, move once
Then reset the filling arr[j+increment]

  1. Fifth cycle
subscript012345678
element314897562

increment value: 4
i value: 8
tmp value: 2
j value: 4, 0
Since J = = 4, the j termination condition is j > = 0 & & arr [J] > TMP; the adjustment part is j-=increment;
Therefore, any value greater than tmp before arr[j] will be adjusted to the back, compared twice, moved twice, and finally filled in arr[j+increment] with the value of tmp

subscript012345678
element214837569
  1. Sixth cycle
subscript012345678
element214837569

increment value: 4 / 3 + 1 = 2;
i value: 2
tmp value: 4
j value: 0
arr[0]<tmp
So compare 1 times and move 0 times
Finally, arr[j+increment]=tmp, that is, arr[0+2]=4
The subsequent cycle steps are omitted. Therefore, we find that the basic order depends on the incremental factor. It divides the elements in the array into several subsequences, and inserts and sorts each subsequence. After all the internal elements are basically ordered, that is, when the incremental factor is 1, all the data will be directly inserted and sorted at the last time

3.4.3 Hill sorting complexity analysis

Through the analysis of this code, I believe we all understand that hill sorting is not sorting after random grouping, but a subsequence of the record group layer separated by an "increment" to realize jumping movement and improve the sorting efficiency
The increment selection here is very important. In this paper, increment/3+1 is used, but what kind of increment should be selected is the best. At present, it is still a mathematical problem. So far, no good increment sequence has been found. However, a large number of studies show that good efficiency can be obtained when the increment sequence is Dlta = 2 ^ (t-k + 1) - 1 [0 < = k < = T < = log (n + 1)], and its time complexity is O(N^(3/2)), It should be noted that the last increment value of the increment sequence must be 1. In addition, Hill sorting is not a stable sorting because it is a jump exchange

3.5. Heap sorting

3.5.1 reactor sequencing principle

The simple selection sorting mentioned earlier, which selects the smallest record among the n records to be sorted, needs to be compared n-1 times
Unfortunately, such operations do not save the comparison results of each trip. In the subsequent comparison, many comparisons have been made before, but because the results of the previous comparison are not saved, these operations are repeated in the subsequent comparison because there are many times of recording
If you can save the comparison results and make corresponding adjustments to these results while selecting the minimum records each time, the overall efficiency will be very high, and heap sorting is an improvement on selection sorting

So what is heap sorting?

Heap sort is a sort algorithm designed by using the data structure of heap. Heap is a structure similar to a complete binary tree and satisfies the nature of heap: that is, the key value or index of a child node is always less than (or greater than) its parent node. Heap sort can be said to be a selective sort that uses the concept of heap to sort
There are two methods:

  1. Large top heap: the value of each node is greater than or equal to the value of its child nodes, which is used for ascending arrangement in heap sorting algorithm;
  2. Small top heap: the value of each node is less than or equal to the value of its child nodes. It is used for descending arrangement in heap sorting algorithm

Schematic diagram before and after reactor adjustment

Schematic diagram of large and small top piles


3.5.2 implementation of heap sorting algorithm

/*
Time complexity: 
	Worst case: O(N log N)
	Best: O(N log N)
	Average: O(N log N)

Space complexity: O(1)

Stability: unstable

Data objects: arrays

[Large root heap [small root heap], ordered area] -- > unload the root from the top of the heap and put it before the ordered interval, and then restore the structure of the heap
*/
class heapSort {
    void heapSort(int[] arr) {
        long before = System.currentTimeMillis();
        // 1. Adjust the ascending order to large root heap
        createBigHeap(arr);
        int end = arr.length - 1;
      	// 2. Top (maximum) and tail exchange
        while (end > 0) {
            int tmp = arr[0];
            arr[0] = arr[end];
            arr[end] = tmp;
          	// 3. Adjust the heap every time and adjust the new heap top element to the corresponding position
            shiftDown(arr, 0, end--);
        }
        long after = System.currentTimeMillis();
        System.out.println("HeapSort time: " + (after - before));
    }

    private void createBigHeap(int[] arr) {
      	// 1.arr.length-1 selects the last element of the array, and then - 1 is to obtain the subscript of the parent node
        for (int parent = (arr.length - 2) >> 1; parent >= 0; --parent) {
          	// 2. Adjust each parent node
            shiftDown(arr, parent, arr.length);
        }
    }

    private void shiftDown(int[] arr, int parent, int sz) {
      	// 1. Calculate the child node according to the parent node
        int child = (parent << 1) + 1;
      	// 2.
        while (child < sz) {
          	// 3. Determine that the right sub node does not cross the boundary and ensure that the value of the left sub tree is less than the right sub tree
            if (child + 1 < sz && arr[child] < arr[child + 1]) {
            //If (child + 1 < SZ & & arr [child] > arr [child + 1]) {/ / the small top heap is in descending order after the symbol changes
                ++child;
            }
          	// 4. If the left subtree is larger than the right subtree, exchange the left subtree with the parent node
            if (arr[child] > arr[parent]) {
            //If (arr [child] < arr [parent]) {/ / after the symbol changes, the small top heap is in descending order
                int tmp = arr[child];
                arr[child] = arr[parent];
                arr[parent] = tmp;
                // 5. Update the parent node to drive the child node to sink
                parent = child;
                child = (parent << 1) + 1;
            } else {
              	// 6. If the left subtree value is less than the parent node, no adjustment is required
                break;
            }
        }
    }
}

Detailed step analysis

Figure 1 ⃣ Η is a large top stack, 90 is the maximum value, and 90 and 20 (tail elements) are called, as shown in Figure 2 ⃣ At this time, 90 becomes the last element of the whole heap sequence. Adjust 20 so that nodes other than 90 continue to meet the definition of the large top heap. See Figure 3 ⃣ ️

Considering the exchange of 30 and 80

Seeing this, I believe you have understood the basic idea of heap sorting, but two problems need to be solved to realize it

  1. How to build a heap from an unnecessary sequence
  2. How to adjust the remaining elements into a new stack after the top elements are put out

To explain them clearly, let's explain the code in detail
This is a big thinking framework

// 1. Adjust the ascending order to large root heap
createBigHeap(arr);
int end = arr.length - 1;
// 2. Top (maximum) and tail exchange
while (end > 0) {
    int tmp = arr[0];
    arr[0] = arr[end];
    arr[end] = tmp;
  	// 3. Adjust the heap every time, and adjust the new heap top element and heap tail element
    shiftDown(arr, 0, end--);
}

Suppose we want to sort the sequence {50,10,90,30,70,40,80,60,20}. Then the end = 8 while loop exchanges the top and "tail" elements each time, and then resize the heap after the exchange
Look at the createBigHeap function

private void createBigHeap(int[] arr) {
 	// 1.arr.length-1 selects the last element of the array, and then - 1 is to obtain the subscript of the parent node
   for (int parent = (arr.length - 2) >> 1; parent >= 0; --parent) {
     	// 2. Adjust each parent node
       shiftDown(arr, parent, arr.length);
   }
}


The first time 9 is passed into the function, it starts from 4 and ends at 1 [including 1, where 1 is actually the position of the subscript 0 element of the array, so that we can interpret it as 1 for the time being]
They are all parent nodes with children. Pay attention to the subscript number of gray nodes

How did you get here?
Remember for (int parent = (arr.length - 2) > > 1; parent > = 0; -- parent)?
9-2=7, 7/2=3, 3+1=4
4 is the left subtree, and 4 + 1 = 5 is the right subtree
Then, for parent - > 4,3,2,1, adjust the left and right subtrees of each parent node to reach the large top heap
Knowing which nodes to adjust, we are looking at how to implement the key shiftDown function

private void shiftDown(int[] arr, int parent, int sz) {
      	// 1. Calculate the child node according to the parent node
        int child = (parent << 1) + 1;
      	// 2.
        while (child < sz) {
          	// 3. Determine that the right sub node does not cross the boundary and ensure that the value of the left sub tree is less than the right sub tree
            if (child + 1 < sz && arr[child] < arr[child + 1]) {
            //If (child + 1 < SZ & & arr [child] > arr [child + 1]) {/ / the small top heap is in descending order after the symbol changes
                ++child;
            }
          	// 4. If the left subtree is larger than the right subtree, exchange the left subtree with the parent node
            if (arr[child] > arr[parent]) {
            //If (arr [child] < arr [parent]) {/ / after the symbol changes, the small top heap is in descending order
                int tmp = arr[child];
                arr[child] = arr[parent];
                arr[parent] = tmp;
                // 5. Update the parent node to drive the child node to sink
                parent = child;
                child = (parent << 1) + 1;
            } else {
              	// 6. If the left subtree value is less than the parent node, no adjustment is required
                break;
            }
        }
    }
  1. When the function is called for the first time, what is passed in is
    arr={50,10,90,30,70,40,80,60,20}
    parent=3
    sz=9

child=parent2+1 gets 7
While (7 < 9) is established
The right subtree of child+1 does not cross the boundary, but the left subtree is 7 ⃣ Tree > right subtree 8 ⃣ So child5 ⃣ ⅸ no self increase
child7 ⃣ ️ > child3 ⃣ Exchange parent3 ⃣ And child7 ⃣ Value of
The new parent node is parent7 ⃣ Therefore, its smallest left subtree has crossed the boundary while ((27 + 1) < 9), so the loop ends

After adjustment

  1. When the function is called the second time, what is passed in is
    arr={50,10,90,30,70,40,80,60,20}
    parent=2
    sz=9

child=2*2+1 gets 5
While (5 < 9) is established
child+1 right subtree does not cross the boundary and left subtree child5 ⃣ Right subtree child6 ⃣ ️, child5 ⃣ ⅸ node self increment
Right subtree child6 ⃣ ️<parent2 ⃣ Therefore, you can directly break out and launch the cycle without exchange

  1. When the function is called the third time, what is passed in is
    arr={50,10,90,30,70,40,80,60,20}
    parent=1
    sz=9

child=12+1 gets 3
While (3 < 9) established
child+1 right subtree does not cross the boundary and left subtree child3 ⃣ Tree > right subtree 4 ⃣ ️, child3 ⃣ ﹥ nodes do not increase automatically
child3 ⃣ ️<parent1 ⃣ No, so no exchange
New parent3 ⃣ New, new child7 ⃣ ️
while(7<9)
7 + 1 does not cross the boundary, child7 ⃣ ️>child8 ⃣ No self increase
child7 ⃣ ️<parent3 ⃣ , no exchange
New parent7 ⃣ Therefore, the new child [27 + 1] has crossed the 9 subscript, so it will launch the cycle


4. When the function is called for the third time, what is passed in is
arr={50,10,90,30,70,40,80,60,20}
parent=0
sz=9

child=20+1
While (1 < 9) established
(1 + 1) < 9 and left subtree child1 ⃣ Tree > right subtree child2 ⃣ Yes, so child1 ⃣ ⅸ no self increase;
child1 ⃣ ️>parent0 ⃣ No, so no exchange
New parent2 ⃣ New, new child5 ⃣ ️
(5 + 1) < 9 and child5 ⃣ ️<child6 ⃣ So + + child5 ⃣ ️;
child6 ⃣ ️<parent2 ⃣ Yes, so exchange
New parent6 ⃣ Therefore, the new child [26 + 1] has crossed the boundary to launch the cycle

  1. Since parent = is - 1 after the fourth cycle, the first adjustment is ended

It is found that it is already a large top pile at this time

  1. Looking
while (end > 0) {
            int tmp = arr[0];
            arr[0] = arr[end];
            arr[end] = tmp;
          	// 3. Adjust the heap every time and adjust the new heap top element to the corresponding position
            shiftDown(arr, 0, end--);
        }

Swap top and tail elements

Then the shiftDown function continues to adjust the heap after size - 1 [that is, the heap size after excluding 90]
... and then loop indefinitely until the size of the heap is 1. After that, it indicates that the adjustment is completed. At this time, it is an ascending array

3.5.2 heap sequencing complexity analysis

/**
 * Number of nodes in each layer: 2 ^ 0 2 ^ 1 2 ^ 2... Number of nodes in the penultimate layer: 2^(n-2)
 * Adjusted height of each floor: h-1 h-2 h-3... Height of the penultimate floor: 1
 * <p>
 * Number of nodes in each layer * number of heights = = time complexity
 * 2^0 + 2^1 +...+ 2^(n-1)
 * h-1    h-2       h-n
 * T(N)=2^0*(h-1)+2^1*(h-2)+2^2*(h-3)+2^3*(h-4)...+2^(h-3)*2 + 2^(h-2)*1
 * 2*T(N)=2^1*(h-1)+2^2*(h-2)+2^3*(h-3)+2^4*(h-4)...+2^(h-2)*2 + 2^(h-1)*1
 * <p>
 * T(N)=1-h + 2^1 + 2^2 + 2^3 +..+2^(h-2) + 2^(h-1)
 * T(N) = 2^1 + 2^2 + 2^3 +..+2^(h-1) + 1-h
 * Summation of proportional series: 2^h-1
 * h = logN+1
 * <p>
 * Total number of nodes: 2^h-1
 */ 

The above mathematical formula explains why the sorting time complexity of the heap is O(logN)

Its running time is mainly lost in building and rebuilding the heap. In the process of building the heap, because we start to build the non terminal node at the bottom and the right of the complete binary tree cluster, compare it with other children and exchange it if there is comparison. For each non terminal node, we can actually compare and exchange it twice at most. Therefore, the time complexity of building the whole heap is O(n)

In the formal sorting, the reconstruction heap time complexity is O(nlongN), because each node dog bites the construction

Therefore, the time complexity of heap sorting is O(nlogN). Since sorting is not sensitive to the sorting status of original records, the best, worst and average time complexity are O(nlogN), which is obviously much better than the time complexity of bubbling, simple selection and direct insertion

3.6. Merge sort

Merge sort is an effective sort algorithm based on merge operation. The algorithm is a very typical application of Divide and Conquer.
As a typical algorithmic application of divide and conquer, merge sorting is realized by two methods:

  • Top down recursion (all recursive methods can be rewritten iteratively, so there is the second method);
  • Bottom up iteration;

In order to clarify the idea here more clearly, {16,7,13,10,9,15,3,2,5,8,12,1,11,4,6,14} is sorted by pairwise merging and then merged to finally obtain an ordered array. Note that its shape is very like an inverted complete binary tree. Generally, the sorting algorithm involving complete binary tree structure is not inefficient

3.5.1 recursive merge sort

/*
Time complexity:
	Worst case: O(logN)
	Best: O(logN)
	Average: O(logN)

Space complexity: O(N)

Stability: stable

Data object: array, linked list

Divide the data into two segments, select the smallest element one by one from the two segments and move it to the end of the new data segment. It can be carried out from top to bottom or from bottom to top

The worst case for quicksort is O(n) ²), For example, the fast sorting of sequential sequences. However, the expected spreading time is O(nlogn), and the constant factor implicit in the O(nlogn) sign is very small [between 1.3 and 1.5], which is much smaller than the merging sorting with stable complexity equal to O(nlogn). Therefore, for most random sequences with weak ordering, fast sorting is always better than merging sorting.
*/
class mergeSort {
    void mergeSort(int[] arr) {
        long start = System.currentTimeMillis();
        _mergeSort(arr, 0, arr.length);
        long end = System.currentTimeMillis();
        System.out.println("mergeSort time:" + (end - start));
    }

    // Auxiliary recursion
    private void _mergeSort(int[] arr, int left, int right) {
        if (right - left <= 1) {
            /*
            Determine whether the current interval has only one element or no element
            Sorting is not required at this time
             */
            return;
        } else {
            int mid = (left + right) >> 1;
            /*
            Let the [left, mid) interval become ordered first
            Then let the [mid, right) interval become ordered
            Merge two ordered intervals

            Binary Tree Postorder Traversal 
             */
            _mergeSort(arr, left, mid);
            _mergeSort(arr, mid, right);
            merge(arr, left, mid, right);
        }
    }

    /*
    The core operation of merge sort is to merge two ordered arrays and use the merge method to complete the process of array merging
    Here, the two arrays are described by the left, mid and right parameters
    [left, mid): Left array
    [mid, right): Right array
     */
    private void merge(int[] arr, int left, int mid, int right) {
        /*
        1. First create a temporary space: save the merged results
        2. The temporary space needs to be able to store two arrays to be merged: right left, which is so long
         */
        if (left >= right) {// Empty interval
            return;
        } else {
            int[] tmp = new int[right - left];
            int tmpIndex = 0;// Indicates where the current element should be placed in the temporary space
            int cur1 = left;
            int cur2 = mid;
            while (cur1 < mid & cur2 < right) {// Guarantee interval validity
                if (arr[cur1] <= arr[cur2]) {// To ensure stability
                    tmp[tmpIndex++] = arr[cur1++];// Insert the element corresponding to cur1 into the temporary space
                } else {
                    tmp[tmpIndex++] = arr[cur2++];
                }
            }
            // After the loop ends, you need to copy the remaining elements into the final result
            while (cur1 < mid) {
                tmp[tmpIndex++] = arr[cur1++];
            }
            while (cur2 < right) {
                tmp[tmpIndex++] = arr[cur2++];
            }
        /*
         You also need to put the results of tmp back into the arr array. (sort in place)
         Replace the sorted result with the [left, right) interval of the original array
         */
            for (int i = 0; i < tmp.length; i++) {
                arr[left + i] = tmp[i];
            }
        }
    }

    /*
    Recursive process: it is gradually slicing the array
    Non recursive version: just adjust the subscript [faster]
        Uniformly merge arrays with length 1
        1.[0], [1] Are two arrays to be merged
        2.[2], [3] Are two arrays to be merged
        3.[4], [5] Are two arrays to be merged
        4.[6], [7] Are two arrays to be merged
        5.[8], [9] Are two arrays to be merged

        Merge arrays of length 2 uniformly
        [0,1], [2,3]
        [4,5], [6,7]
        [8,9], [10,11]

        [0,1,2,3], [4,5,6,7]
        [8,9,10,11], [12,13,14,15]
     */
    void mergeSortByLoop(int[] arr) {
        long start = System.currentTimeMillis();
        int gap = 1;// gap is used to limit the length of each array to be merged
        for (; gap < arr.length; gap *= 2) {
            // Current two arrays to be merged
            for (int i = 0; i < arr.length; i += 2 * gap) {
                /*
                 Control the merging of two adjacent arrays in this array
                 [left, mid) And [mid, right) will be merged
                 gap:1
                     i:0    0,1    1,2
                     i:2    2,3    3,4
                     i:4    4,5    5,6
                     ...
                 gap:2
                    i:0     0,2     2,4
                    i:4     4,6     6,8
                    i:8     8,10    10,12
                    ...
                 gap:4
                    ...
                 gap:8
                    i:0     0,8     8,16[The array length tested is 10] -- > 8,10
                    i:16    Cross the border
                 */
                int left = i;
                int mid = i + gap > arr.length ? arr.length : i + gap;
                int right = i + 2 * gap > arr.length ? arr.length : i + 2 * gap;
                merge(arr, left, mid, right);
            }
        }
        long end = System.currentTimeMillis();
        System.out.println("mergeSortByLoop time:" + (end - start));
    }
}

Detailed code analysis steps:

void mergeSort(int[] arr) {
    long before = System.currentTimeMillis();
    mergeSortInternal(arr, 0, arr.length);
    long after = System.currentTimeMillis();
    System.out.println("MergeSort time: " + (after - before));
    }

Function call entry, enter a left closed and right open interval [0, arr.length)

private void mergeSortInternal(int[] arr, int left, int right) {
    if (right - left <= 1) {
        return;
    } else {
        int mid = (left + right) >> 1;
        mergeSortInternal(arr, left, mid);
        mergeSortInternal(arr, mid, right);
        merge(arr, left, mid, right);
    }
}

Interval division, left and right recursion
Finally, give it to merge to merge [left, mid], [mid, right) two intervals

private void merge(int[] arr, int left, int mid, int right) {
      int[] tmp = new int[right - left + 1];
      int tmpIndex = 0;
      int cur1 = left, cur2 = mid;
      while (cur1 < mid && cur2 < right) {
          if (arr[cur1] <= arr[cur2]) {// To ensure stability
              tmp[tmpIndex++] = arr[cur1++];// Insert the element corresponding to cur1 into the temporary space
          } else {
              tmp[tmpIndex++] = arr[cur2++];
          }
      }
      	// Process remaining data
        while (cur1 < mid) {
            tmp[tmpIndex++] = arr[cur1++];
        }
        while (cur2 < right) {
            tmp[tmpIndex++] = arr[cur2++];
        }
      	// The data returns the original array position, so add arr[left+i] instead of arr[i]
        for (int i = 0; i < tmpIndex; i++) {
            arr[left + i] = tmp[i];
        }
    }

Specific algorithm implementation of interval merging

Assuming that there is {50,10,90,30,70,40,80,60,20} data, how does the recursive code execute?

In fact, the value of the left interval passed in is 0, and the value of the right interval is 9, that is [0, 9)
Then, the interval is divided into [0,4], and the left interval is [4,9)
[0,2), [2, 4), [4,6), [6, 9)
Because the termination condition of recursion is right left < = 1, that is, when there are two elements, recursion is introduced and merged through the merge function

3.5. Non recursive merge sort

Non recursive code

void mergeSortTraversalNo(int[] arr) {
        long before = System.currentTimeMillis();
      	// gap: limit the length of each array to be merged
        int gap = 1;
        for (; gap < arr.length; gap *= 2) {
          	// Current two arrays to be merged
            for (int i = 0; i < arr.length; i += 2 * gap) {
                int left = i;
                int mid = i+gap> arr.length? arr.length : i+gap;
                int right = i+2*gap> arr.length? arr.length : i+2*gap;
                merge(arr, left, mid, right);
            }
        }
        long after = System.currentTimeMillis();
        System.out.println("MergeSortTraversalNo time: " + (after - before));
    }

Just remember the variables gap and I,
The left interval is i, the middle interval is i+gap, and the right interval is i+2*gap
The left, middle and right arrays should be kept within the bounds

3.5. Merge sort complexity analysis

Let's analyze the time complexity of merging sorting. One merging needs to merge the adjacent ordered sequences with length h in arr[1] ~ arr[n]. And put the results into tmp[1]~TR1[n], which requires scanning all records in the sequence to be sorted, so it takes 0(n) time. According to the depth of the complete binary tree, the whole merging sorting needs to be performed [log2n] times. Therefore, the total time complexity is 0(nlogn), and this is the best, worst and average time performance of the merging sorting algorithm.

Since merge sorting requires the same amount of storage space as the original record sequence to store the merge results and the stack space with the depth of log2n during recursion, the space complexity is 0(n+logn].

In addition, after careful study of the code, it is found that there is an if (arr [cur1] < = arr [cur2]) statement in the merge function, which shows that it needs pairwise comparison and there is no jump. Therefore, merge sorting is a stable sorting algorithm.

In other words, merge sort is an efficient and stable algorithm that occupies more memory.

3.7. Quick sort

Finally, it's our master. If your boss asks you to write a sorting algorithm after working in the future, and there is no quick sorting in your algorithm, it's better not to make a noise and secretly find the quick sorting algorithm to practice hugging Buddha's feet, at least not to be laughed at

Hill sort is equivalent to the upgrade of direct insertion sort. They both belong to the insertion sort class
Heap sorting is equivalent to the upgrade of simple selection sorting. They both belong to selection sorting
Quick sort is considered to be an upgraded version of the slowest bubble sort in front. They all belong to the exchange sort class

Quick sort also realizes sorting through continuous comparison and mobile exchange, but its implementation increases the distance of record comparison and movement, and directly moves records with larger keywords from the front to the back; records with smaller keywords directly move from the back to the front, thus reducing the total number of comparisons and mobile exchanges

3.7.1 basic idea of quick sort algorithm

The records to be arranged are divided into two independent parts by one-time sorting. If the keywords of one part of the records are smaller than those of the other part, the two parts of the records can be sorted separately to achieve the purpose of ordering the whole sequence

3.7.2 implementation and optimization steps of fast sorting recursive algorithm

/*
Time complexity:
	Worst case: if it is an ordered sequence, it will be O(N^2)
	Best: suppose there are N data to form a full binary tree. The sum of traversal of the left and right subtrees of each layer is N, and the height of the number is log2(N+1) [rounded up]. Then it is O(N log N)
	Average: O(N log N)

Space complexity:
	Worst case: O(logN), traverse the left subtree and then the right subtree, and the space of the left subtree will be released. When traversing the left subtree, there must be a tree on each layer, and the left is used up for the right, so the space complexity is the height of the tree
	Best: O(N)

Stability: unstable

Data objects: arrays

[Decimal, benchmark element, large number] -- > randomly select an element in the interval as the benchmark, put the element less than the benchmark value after the benchmark, and sort the decimal area and large number area respectively
*/
class quickSort {
    void quickSort(int[] arr) {
        long start = System.currentTimeMillis();
        quick(arr, 0, arr.length-1);
        long after = System.currentTimeMillis();
        System.out.println("quickSort`time:" + (after - start));
    }

    private void quick(int[] arr, int left, int right) {
        if (left >= right) {
            return;
        } else {
            int pivot = partition(arr, left, right);
            quick(arr, left, pivot - 1);
            quick(arr, pivot + 1, right);
        }
    }

    private int partition(int[] arr, int left, int right) {
        int tmp = arr[left];
        while (left < right) {
            while (left < right && arr[right] >= tmp) {
                --right;
            }
            arr[left] = arr[right];
            while (left < right && arr[left] <= tmp) {
                ++left;
            }
            arr[right] = arr[left];
        }
        arr[left] = tmp;
        return left;
    }
}

code analysis

  1. First, execute quick(arr, 0, arr.length-1)
    Pass in the closed interval of [0, arr.length-1] [pay attention to the distinction between the interval passing parameter of merging sort]
  2. Re execution
private void quick(int[] arr, int left, int right) {
    if (left >= right) {
        return;
    } else {
        int pivot = partition(arr, left, right);
        quick(arr, left, pivot - 1);
        quick(arr, pivot + 1, right);
    }
}

This is recursive divide and conquer

  1. Re execution
private int partition(int[] arr, int left, int right) {
        int tmp = arr[left];
        while (left < right) {
            while (left < right && arr[right] >= tmp) {
                --right;
            }
            // If you find a small one on the right, move it directly to the left
            arr[left] = arr[right];
            while (left < right && arr[left] <= tmp) {
                ++left;
            }
            // If you find the big one on the left, move it directly to the right
            arr[right] = arr[left];
        }
        //Then fill the hub and return the hub value
        arr[left] = tmp;
        return left;
}

A hub that directly moves the elements at both ends and then returns the hub value

There are still many improvements in the quick sort just now
Optimize selection pivot

If the pivot we select is in the middle of the whole sequence, we can divide the whole sequence into decimal set and large set. But note that this is only if, with bad luck, we choose a minimum or maximum value as the hub to divide the whole array, then such division will lead to a decrease in efficiency
Some people say that the number between left and right should be selected randomly. Although the performance has solved the performance bottleneck of basically orderly sequence quick sorting, it is likely that random will hit good luck. When it reaches an extreme value, it doesn't mean giving it in vain?
Below is the random selection method

private void quick(int[] arr, int left, int right) {
    if (left >= right) {
        return;
    } else {
    	Random random = new Random();
    	int rand = random.nextInt(right - left) + left + 1;
    	int tmp = arr[left];
    	arr[left] = arr[rand];
    	arr[rand] = tmp;
        int pivot = partition(arr, left, right);
        quick(arr, left, pivot - 1);
        quick(arr, pivot + 1, right);
    }
}

With further improvement, there will be the method of taking the middle of three numbers

private void quick(int[] arr, int left, int right) {
    if (left >= right) {
        return;
    } else {
    	medianOfThree(arr, left, right);
        int pivot = partition(arr, left, right);
        quick(arr, left, pivot - 1);
        quick(arr, pivot + 1, right);
    }
}

private void medianOfThree(int[] arr, int left, int right) {
	 //arr[mid]<arr[left]<arr[right]
     int mid = (left + right) >> 1, tmp = 0;
     if (arr[mid] > arr[left]) {
         tmp = arr[mid];
         arr[mid] = arr[left];
         arr[left] = tmp;
     }
     if (arr[mid] > arr[right]) {
         tmp = arr[mid];
         arr[mid] = arr[right];
         arr[right] = tmp;
     }
     if (arr[left] > arr[right]) {
         tmp = arr[left];
         arr[left] = arr[right];
         arr[right] = tmp;
     }
}

Select three keywords for sorting, and take the middle number as the hub. Generally, the left, middle and right numbers can also be selected randomly. In this way, at least the middle number will not be the minimum or maximum number. In terms of probability, it is very unlikely that the three numbers are the minimum or maximum number. Therefore, the possibility of the middle number in the middle value is greatly improved
Since the whole sequence is in an unordered state, random selection of three numbers is actually the same as taking three numbers from the left, middle and right ends, and the random number generator itself will bring time overhead, so random generation will not be considered

Optimize recursive operations

private void quick(int[] arr, int left, int right) {
    if (left >= right) {
        return;
    } else {
    	medianOfThree(arr, left, right);
    	while(left < right){
    		int pivot = partition(arr, left, right);
    		quick(arr, left, pivot - 1);
    		left = pivor + 1;
		}
    }
}

Sorting scheme when optimizing small arrays
When the array is very small, direct insertion has the best performance in simple sorting. The reason is that recursive operation is used in fast sorting. When sorting a large number of data, this performance impact is ignored relative to its overall algorithm advantage. However, if the array has only a few records to be sorted, it becomes a problem of killing chickens with a bull's knife

    private void quick(int[] arr, int left, int right) {
        if (left >= right) {
            return;
        } else {
        	// Hill sort is faster than insert sort, so Hill sort can be used. If the amount of data is small enough, direct insert sort can also be used for optimization
            if (right - left <= 150) {
                int gap = arr.length - 1;
                while (gap > 0) {
                    for (int i = gap; i < arr.length; i++) {
                        int tmp = arr[i];
                        int j = i - gap;
                        for (j = i - gap; j >= 0 && arr[j] > tmp; j -= gap) {
                            arr[j + gap] = arr[j];
                        }
                        arr[j + gap] = tmp;
                    }
                    gap >>= 1;
                }
            } else {
                /*
                3.Triple median method
                arr[mid]<arr[left]<arr[right]
                 */
//                medianOfThree(arr, left, right);

                /*
                2. Random selection
                Optimize by luck
                 */
//                Random random = new Random();
//                int rand = random.nextInt(right - left) + left + 1;
//                int tmp = arr[left];
//                arr[left] = arr[rand];
//                arr[rand] = tmp;

                /*
                1. Fixed value selection
                 */
                int pivot = partition(arr, left, right);
                quick(arr, left, pivot - 1);
                quick(arr, pivot + 1, right);
            }
        }
    }

3.7.4 non recursive implementation of quick sort

// Non recursive fast scheduling
void quickSortTraversalNo(int[] arr) {
     long before = System.currentTimeMillis();
     Stack<Integer> stack = new Stack<>();
     stack.push(0);
     stack.push(arr.length - 1);
     while (!stack.empty)) {
         // Pay attention to the access order
         int right = stack.pop();
         int left = stack.pop();
         if (left >= right) {
             continue;
         } else {
             int pivot = partition(arr, left, right);
             // The order of who starts first in the left and right sections does not affect
             stack.push(left);
             stack.push(pivot - 1);
             stack.push(pivot + 1);
             stack.push(right);
         }
     }
     long after = System.currentTimeMillis();
     System.out.println("QuickSortTraversalNo time: " + (after - before));
 }
Sorting algorithmAverage time complexityBest caseWorst case scenarioSpatial complexitystability
Bubble sortingO(N^2)O(N)O(N^2)O(1)stable
Select sortO(N^2)O(N^2)O(N^2)O(1)instable
Insert sortO(N^2)O(N)O(N^2)O(1)stable
Shell Sort O(logN)O(NlogN)O(Nlog^2N)O(1)instable
Heap sortO(NlogN)O(NlogN)O(NlogN)O(1)instable
Merge sortO(NlogN)O(NlogN)O(NlogN)O(N)stable
Quick sortO(NlogN)O(NlogN)O(NlogN^2)O(N)instable

Tags: Java Algorithm data structure list

Posted on Fri, 26 Nov 2021 14:40:03 -0500 by JWitness