From Dutch Flag Issue to Fast Row Optimized Upgrade

If there is a skill in the field of computers that will remain obsolete after ten or twenty years, I think it must be algorithms and data structures.

I. The Dutch Flag

                

The so-called Dutch flag problem is that given a set of numbers, put the number less than a certain number of nums on the left, the number equal to num in the middle, and the number greater than num on the right. It is not difficult to solve this problem without any restrictions, but it requires O(n) in time complexity and O(1) in extra space complexity.Under what conditions?

Well, 3 seconds of thinking time, 3, 2, 1... For people who don't persist in doing algorithmic exercises for a long time, this question is not really very simple. Here's the answer directly:

The idea is very simple. First, we will give the solution using arrays as the bottom level, and then we will talk about how to use the chain table. First, we define two pointers index1 and index2, dividing the array into three areas. The number of left parts with subscripts less than or equal to index1 is less than num, the number between index1 and index2 is equal to num, and the subscript is greater than or equal to the right part of index2The number of points is greater than num. Then traverse the array from left to right and adjust accordingly according to the following strategies.

1. If the current element traversed is less than num, the current element is swapped with the next element of index1, and index1++ (that is, the element traversed is included in the less than area) moves the pointer to the right ++.

2. If the traversed element equals num, the traversed pointer moves to the right ++.

3. If the traversed element is greater than num, the current element is exchanged with the previous element of index2, while index2-- (which includes the traversed element in a larger area), the traverse pointer does not move, because the elements exchanged to the current location are not compared and need to be handed over to the next round.

Follow these ideas and write the following code:

public static void splitArrByNum(int[] arr, int num){
    if(arr == null || arr.length <= 1){
        return;
    }
    int index1 = -1;
    int index2 = arr.length;
    int i = 0;
    while(i < index2){
        if(arr[i] < num){
            swap(arr,i++,++index1);
        }else if(arr[i] == num){
            i++;
        }else{
            swap(arr,i,--index2);
        }
    }
}

public static void swap(int[] arr, int i, int j){
    arr[i] = arr[i] ^ arr[j];
    arr[j] = arr[i] ^ arr[j];
    arr[i] = arr[i] ^ arr[j];
}

Here's a fancy way to swap elements, which is not covered here, but it's worth noting that this way of swapping elements requires different memory addresses for the elements to be swapped, i!=J, otherwise there will be a problem.

2. Optimized upgrade of express platoon

The Dutch flag is used as a solution to the problem that people familiar with the quick queue should soon notice that the area to be sorted is divided by a benchmark number to achieve the sorting purpose recursively. Of course, let's talk about an optimized upgrade process for the quick queue.

1.Fast Row 1.0

Version 1.0 quick queuing is also one of the most common ways we usually write, with the following code:

    public void quickSort(int[] arr, int left, int right) {
        if (left < right) {
            int partitionIndex = partition(arr, left, right);
            quickSort(arr, left, partitionIndex - 1);
            quickSort(arr, partitionIndex + 1, right);
        }
    }

    private int partition(int[] arr, int left, int right) {
        // pivot
        int pivot = left;
        int index = pivot + 1;
        for (int i = index; i <= right; i++) {
            if (arr[i] < arr[pivot]) {
                swap(arr, i, index);
                index++;
            }
        }
        swap(arr, pivot, index - 1);
        return index - 1;
    }

The time complexity of the fast queue is O(N*logN). The code above picks the leftmost value each time it chooses a baseline value. If the original array itself is ordered, then each partitio subscript is the leftmost value, making the fast queue a bubble sort with a time complexity of O(N^2), contrary to our original perception. So what should we do to further optimize?

2.Fast Row 2.0

In order to avoid choosing the leftmost value as the base value every time, which value should we choose? The easiest thing to think about is random search, ok, follow this line of thought and make adjustments to our code:

    public void quickSort(int[] arr, int left, int right) {
        if (left < right) {
            swap(arr, left, left+(int)(Math.random() *(right-left+1)));
            int partitionIndex = partition(arr, left, right);
            quickSort(arr, left, partitionIndex - 1);
            quickSort(arr, partitionIndex + 1, right);
        }
    }

Before partition, swap the leftmost element with any random position in the array, which keeps the number of datums selected randomly at each time, avoiding the problem of increasing time responsibility because the array itself is ordered. Have you thought about it here, what is the time complexity here? Or, what is the time complexity of fast scheduling?So, this involves Master's formula.

3.Master Formula

Master formulas are used to calculate the time complexity of general recursive algorithms:

Master formula: T(N) = a*T(N/b) + O(N^d)

Where a denotes the number of recursions, that is, the number of sub-problems generated, b denotes that each recursion is one-third of the size of the original data volume, and f(n) denotes the sum of the time it takes to decompose and merge.

When:

Logb(a) < D Time Complexity is O(N^d)

Logb(a) > D Time Complexity is O(N^logb(a))

logb(a) == d Time Complexity is O(N^d * logN)

Analyzing fast-pacing algorithms, under optimal conditions, each partition can divide the data into two equal parts, where the time complexity formula is:

        T(N) = 2T(N/2) + f(N)

In this case, a=2,b=2,d=1, so logb(a) == d gives the time complexity of O(N*logN)

But! What's the best case, that is, each partition divides the data equally into two parts to achieve this time complexity? What would the time complexity be if the partition was done by randomly selecting a baseline value? It's still O (N*logN)Why? This requires calculating mathematical expectations in random situations. Frankly, I don't know! There is specific proof in the Introduction to Algorithms. Interested students can look at the Introduction to Algorithms.

4. Quick Row 3.0

The sorting algorithm mentioned earlier seems to be a little different from the Dutch flag problem. Here we use the solution of the Dutch flag problem to make further optimization. First we analyze the above problem in version 2.0. We divide the array to be sorted by the base number, which is smaller than the base number and larger than the base number, for the next round of partition sorting.What about those elements that are equal to the base number? They are also divided into these two parts. Actually, it makes sense that those elements that are equal to the base number can completely not participate in the next round of partition. Thinking of this, we can directly treat the solution of the Dutch SOE problem as a partition, and finally form a fast queue version 3.0 as follows:

public static void quickSort(int[] arr, int L, int R){
    if(L == R){
        return;
    }
    if(L < R){
        swap(arr,L+(int)(Math.random() *(R-L+1)), L);
        int[] p = partition(arr, L, R);
        quickSort(arr, L ,p[0]-1);
        quickSort(arr, p[1]+1, R);
    }
}
public static int[] partition(int[] arr,int L,int R){
    int less = L-1;
    int more = R;
    while(L < more){
        if(arr[L] < arr[R]){
            swap(arr, ++less, L++);
        } else if(arr[L] > arr[R]){
            swap(arr, --more, L);
        } else {
            L++;
        }
    }
    swap(arr, more, R);
    return new int[]{less+1,more};
}

At this time those elements equal to the base number are no longer involved in the next round of sorting and the algorithm is further optimized.

Then can further optimization be done? It seems difficult in theory, but from an engineering point of view, we can try to use the fast queuing idea in algorithm scheduling, and insert sort in local small-range sorting, which makes a combination of several sorting methods to further optimize the algorithm.

public static void quickSort(int[] arr, int L, int R){
    if(L == R){
        return;
    }
    /**
     *  Some optimizations
     */
    if(L > R - 60){
        //Sort by insertion
        //O(N^2)
        return;
    }
    if(L < R){
        swap(arr,L+(int)(Math.random() *(R-L+1)), L);
        int[] p = partition(arr, L, R);
        quickSort(arr, L ,p[0]-1);
        quickSort(arr, p[1]+1, R);
    }
}

3. Realization of Dutch Flag Problem Chain List

We're back to the Dutch flag question. That's an array of data, so what if it's a chain table?

Six variables are used:

Node sH = null; //Header pointer less than num

Node sT = null; //Tail pointer less than num

Node eH = null; //Header pointer equal to num part

Node eT = null; //End pointer equal to num part

Node bH = null; //Header pointer greater than num

Node bT = null; //Tail pointer greater than num

With these six pointers, you can divide the chain table into three parts, and then join the three parts of the pointer at the beginning and end to achieve the goal. However, it is important to note that there may be no elements smaller than num, no elements equal to num, or elements larger than num, so make appropriate judgments when making the first and last connections.

public static Node listPartition(Node head, int pivot){
    Node sH = null;
    Node sT = null;
    Node eH = null;
    Node eT = null;
    Node bH = null;
    Node bT = null;
    Node next = null;
    while(head != null){
        next = head.next;
        head.next = null;
        if(head.value < pivot){
            if(sH ==null){
                sH = head;
                sT = head;
            } else {
                sT.next = head;
                sT = head;
            }
        }else if(head.value == pivot){
            if(eH ==null){
                eH = head;
                eT = head;
            } else {
                eT.next = head;
                eT = head;
            }
        }else{
            if(bH ==null){
                bH = head;
                bT = head;
            } else {
                bT.next = head;
                bT = head;
            }
        }
        head = next;
    }
    if(sT != null){
        sT.next = eH;
        eT = eT == null ? sT : eT;
    }
    if(eT != null){
        eT.next = bH;
    }
    return sH != null ? sH : (eH != null ? eH : bH);
}

Finally, a question was raised, do you know what optimization Arrays.sort has done since JDK1.7?

Author: Chen Yican

Tags: Algorithm data structure Machine Learning

Posted on Tue, 21 Sep 2021 12:06:16 -0400 by mudasir