# 1, Course design topics and requirements

Application and comparison of sorting algorithms

[basic requirements]

- 1. Generate three groups of 10 million numbers, namely random numbers, basic positive order (all elements are shifted to the left by 2 bits on the basis of positive order) and reverse order (what data structure is used? What if the amount of data reaches 100 million, 1 billion?);
- 2. Realize the recursive and non recursive versions of quick sort (improved version), merge sort and heap sort;
- 3. It is required to find the first d largest numbers (d is the input parameter) from three groups of 10 million data. Please sort all data with fast sorting, heap sorting, merge sorting and insert / bubble sorting algorithms, and then find the maximum d numbers. Compare the differences between different sorting algorithms and recursive and non recursive algorithms (running time);
- 4. There is no need to sort the 10 million data as a whole. Find the first d largest and smallest numbers from the three groups of 10 million data. What data structure is used?

# 2, Demand analysis

For different data sequences, we have a variety of methods to sort. Appropriate algorithms can save a lot of time and space overhead. How to judge the most efficient method is particularly important.

The program searches different random and ordered sequences with recursive and non recursive versions of quick sort (improved version), merge sort and heap sort. The running time of different algorithms is analyzed to obtain the optimal search method under different sizes and orders.

For the problem of not sorting 10 million data as a whole, find the first D largest and smallest numbers from three groups of 10 million data, * * solution: * * build a minimum heap with size D, first take out the first D numbers and build a small top heap, and then traverse the data. Compared with the top of the small top heap, it is smaller than the top of the heap and discarded directly. If it is larger than the top of the heap, replace the top of the heap and rebuild the heap, After traversing all 10 million numbers, the minimum heap is the maximum d number.

# 3, Design

## 3.1 design idea

### (1) Data structure design

The size of the three dynamic arrays ABC is the size entered by the user. They are used to store random numbers, basic positive order (all elements are shifted to the left by 2 bits on the basis of positive order) and reverse order sequences respectively.

Six t's_ X (x = a - > F) copies one of ABC and is used to sort several algorithms.

### (2) Algorithm design

The search functions involved in the program are shown in the figure above.

## 3.2 detailed design

Quick sort (improved version) (a kind of exchange sort)

Recursive fast scheduling:

Improvement scheme: improve the method of selecting the pivot, that is, select the median in the data set as the pivot each time (the selection of the median can be completed in O(n) time).

//Return to median position int GetMiddleValue(int A[], int low, int high) { // int mid = low + (high - low) >> 1; int mid = (high + low) / 2; int y1 = A[low] > A[mid] ? low : mid; int y2 = A[low] > A[high] ? low : high; int y3 = A[mid] > A[high] ? mid : high; if (y1 == y2) return y3; else return A[y1] > A[y2] ? y2 : y1; }

The first step is to make the pivot element leave the data segment to be segmented by exchanging the pivot element with the last element; i starts with the first element and j starts with the penultimate element. When i is to the left of J, we move i to the right over elements smaller than the pivot element, and move J to the left over elements larger than the pivot element. When i and j stop, i points to the large element and j points to the small element. If i is to the left of J, the two elements are interchanged. If i and j have been interleaved at this time, that is, i > J, so they are not exchanged. At this point, the pivot element is exchanged with the element referred to by i.

Stack based non recursive fast scheduling:

void FD_QuickSort(int AA[], int low, int high) { stack<int>x; stack<int>y; x.push(low); y.push(high); int Beg, End, k; while (x.size() != 0) { Beg = x.top(); x.pop(); End = y.top(); y.pop(); k = partition(AA, Beg, End); // PrintA(AA, N); if (Beg < k - 1) { x.push(Beg); y.push(k - 1); } if (End > k + 1) { x.push(k + 1); y.push(End); } } }

The first step is to apply for a stack to store the start position and end position of the sorted array.

The second step is to stack the start position s and end position e of the whole array

Step 3: stack data, sort the stack data and find the final position p of the benchmark data.

Step 4: judge whether the starting position s is less than the reference position p-1. If less than, stack the starting position and p-1 as the end position

Step 5: judge whether the reference position p+1 is less than the end position e. if it is less than, stack p+1 as the starting position and e as the end position

Step 6: judge whether the stack is empty. If not, repeat step 3, otherwise exit the operation.

Merge sort:

The basic idea is to divide the array into two groups A and B. if the data in these two groups are orderly, you can easily sort these two groups of data. Groups A and B can be divided into two groups respectively. By analogy, when the separated group has only one data, it can be considered that the group has reached order, and then merge the two adjacent groups. In this way, the merging sort is completed by first recursively decomposing the sequence and then merging the sequence.

//There will be two ordered sequences, a[first...mid] and a[mid...last]. void mergearray(int a[], int first, int mid, int last, int temp[]) { int i = first, j = mid + 1; int m = mid, n = last; int k = 0; while (i <= m && j <= n) { if (a[i] < a[j]) temp[k++] = a[i++]; else temp[k++] = a[j++]; } while (i <= m) temp[k++] = a[i++]; while (j <= n) temp[k++] = a[j++]; for (i = 0; i < k; i++) a[first + i] = temp[i]; }

Heap sort: (select a sort)

Storage:

Generally, the heap is represented by an array, and the subscript of the parent node of node i is (i – 1) / 2. The subscripts of its left and right child nodes are 2 * i + 1 and 2 * i + 2 respectively.

Heap array:

According to the nature of the heap, as long as it is partially ordered, that is, the root node is greater than the value of the left and right nodes. Abstract the array into a complete binary tree, so just traverse each node forward from the last non leaf node. If the current node is larger than both the left and right subtree nodes, it is already a maximum heap. Otherwise, exchange the current node with the larger one of the left and right nodes, After the exchange, you still need to recursively check whether the child nodes meet the nature of the heap. If not, you can adjust it down. In this way, you can complete the heap of the array

The following is the tuning code for the non recursive heap

// Adjust from node i, n is the total number of nodes, and calculate from 0. The child nodes of node i are 2 * I + 1 and 2 * I + 2 void FD_AdjustHeap(int a[], int ii, int n) { int temp = a[ii]; int L = 2 * ii + 1; int R = 2 * ii + 2; while (L < n) { if (R < n && a[R] > a[L]) //Find the biggest of the left and right children L++; if (a[L] <= temp) break; a[ii] = a[L]; //Move the larger child node up to replace its parent node ii = L; L = 2 * ii + 1; } a[ii] = temp; }

Sort (delete):

In order to rebuild the heap, the actual operation is to assign the value of the last data to the root node, and then make a top-down adjustment from the root node. When adjusting, first find the smallest of the left and right child nodes. If the parent node is smaller than the smallest child node, there is no need to adjust. On the contrary, exchange the parent node with it and then consider the subsequent nodes. It is equivalent to "sinking" a data from the root node.

//Non recursive heap sort

void FD_HeapSort(int a[], int n) { FD_BuildHeap(a, n); for (int i = n - 1; i >= 0; i--) { Swap(a[i], a[0]); FD_BuildHeap(a, i); } }

# 4, Test data and results:

Length 10000, number of searches 10

Length 100000, number of searches 10