KMP algorithm details (nanny level)

KMP algorithm details (nanny level)

It's not so difficult to understand. It's a little wordy to read other people's articles

What you can do is not what you can do. You can only do it if you can explain it to others

background

  • Give you two strings and ask you if they have an inclusive relationship (continuous)! Most people will first think of violent solutions, one by one! Until you find the one on the comparison! But if it's these two: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

This is the field of KMP Research (accelerated BUFF)! However, to fully understand KMP, you must first understand a maximum prefix array

Maximum prefix array (longest prefix & & longest suffix)

Suppose a character K, we give it a message,

So what's in the next array? ------ the largest same Prefix suffix! For example

  • abbcabbk ----- for the string in front of K: abbcabb; what are the same pre suffixes? (the first N and the last n are the same) obviously: abb, so next={a,b,b}; note: it does not include itself, such as abbcabb. If there are multiple! Select the longest! In fact, we only need length information! Just like next[k]=3;

For example: the next={-1,0,0,1,2} of the string "ABC"; if there is no element in front of the first a, it is - 1;

How to embody it in the code? It's very simple! Design a small algorithm!

How to implement it? You should first take it as an available API, solve KMP, and then go back to study how to design it!

Given two strings, judge whether str1 contains str2 string!

str1="abbtcfabbtkadd······"

str2="abbtcfabbtu ·····" each character in str2 has next information!

str1abbtcfabbtka···
str2abbtcfabbtuv···
step2:str1××××××abbtka
str2abbtc
step3:str1×××××××××××a
str2abbtcfabbtuv···
  • Step 1: traditional comparison - str1[i] compare str2[i], compare to the last element – > k! = u, comparison failed; skip to the next comparison!

  • Step 2: the tradition is to repeat the first step from str[1]! Obviously, it is too slow. At this time, KMP needs to shine! Get the last element of the first step comparison! - U (including its longest prefix array * * [abbt] * *). This time, str1 does not return to str1[1], but stops in place ----- > k; so where does str2 start to compare with him?

    Take a closer look at the element in front of k: abbt, what is this? Isn't this the prefix array of u in str2? So str2 jumps to the last one of the longest prefix abbt and starts the comparison! c

    step2:str1××××××abbtka
    str2abbtc
  • Step 3: obviously: k! = c; the comparison fails; repeat step 2! The prefix array of c is obviously empty, that is, the length is 0; str2 indicates that you have no choice! Only str1 can be selected! str1 can only move backward: select the next element of k and repeat step 2 or 3 with str2!

step3×××××××××××a
abbtcfabbtuv···

If the theory exists, knock the code! It is implemented in Java!

  • For array comparison, indexes are essential! First, define i1 and i2 indexes corresponding to str1 and str2;
  • Get the array of the maximum prefix number of each element: getNextArray (str2);
  • Circular judgment, provided that i1 and i2 do not exceed the limit: i1 < STR1. Length() & & i2 < STR2. Length()
  • The three situations correspond to the above three steps;
    • 1. Compared: i1++;i2 + +;
    • 2. No comparison; i2 jumps to the last one after the maximum prefix: i2 = next[i2];
    • 3. 2 can jump depends on whether there is a prefix: if there is no prefix, you can't jump, that is, i2 == 0. At this time, you have to let i1++
 
public static int kmp(String str1,String str2){
        //Termination conditions
        if (str1 == null || str2 == null || str1.length() < str2.length()){
            return -1;
        }
        char[] str11 = str1.toCharArray();
        char[] str22 = str2.toCharArray();
        int i1 = 0;
        int i2 = 0;
        int[] next = getNextArray(str22);
        //Loop condition, i1\i2 cannot exceed the limit
        while(i1 < str1.length() && i2 < str2.length()){
            if (str11[i1] == str22[i2]){
                i1++;
                i2++;
            }else if (i2 == 0){//There is no prefix to jump
                //Just let i1 move back
                i1++;
            }else {
                //Move i2 to the next bit after the prefix, that is, next[i2]
                i2 = next[i2];
            }
        }
        //Return condition: i2 is it the last element?
        return i2 == str22.length ? i1 - i2 : -1;
    }

Longest prefix tree algorithm

I thought for a long time and wrote an initial version:

The violent solution is used. Each character has to find its prefix length, which requires a for loop! For this element, find its maximum possible prefix length, and find the longest time first! Max = I (current element array subscript) - 1, min = 1; this is another loop; here, use while to define a variable x to increase from 1 to i-1!

Finally, judge whether the pre suffixes corresponding to this length are equal, and it is another for loop! The loop area is from 0~i-x-1; if the prefix and suffix elements are not equal, it will be scrapped in this round! To the next round, X+=1; if all are equal (the last one is also equal) , there is no need to proceed to the next round, because the result of this round is the largest! Let x = i jump out of the while loop to calculate the next element!

//Find the maximum equal prefix for the string
    private static int[] getNextArray(char[] str22) {
        if (str22.length <= 1){
            return new int[]{-1};
        }
        //Judge whether the front n and the rear n are equal
        int[] arr = new int[str22.length];
        //The first and second of the array must be - 1 and 0; it won't change
        arr[0] = -1;
        arr[1] = 0;
        //Traverse the remaining elements!
        for (int i = 2; i < str22.length; i++) {
            int x  = 1;//The first round ··· corresponds to the maximum length
            while (x <= i-1){//Total argument number: i-1
                //Element length of this round
                for (int j = 0;j<=i-x-1;j++){
                    if (str22[j] != str22[j+x]){
                        x += 1;
                        break;
                    }else if (str22[i-x-1] == str22[i-1]){//All equal, find the maximum length
                        arr[i] = i-x;
                        x = i;
                    }
                }
            }
        }
        return arr;
    }

It can be said that it is easy to understand! But! What you write is never the best solution! Here is a more advanced solution!

First of all: the first and second elements of the next array must be specified - 1 and 0. There is no doubt! Can we use the information of the second element when we get to the third element?

It can be seen that we can directly use the information of the previous element and compare it based on it! If the previous position str[i-1] == str[next[i-1]], ==str[next[i-1]] is the next element of the longest prefix of I-1, and compare it with the next element (i-1) str[i-1] = = of the longest suffix, then we can deduce: next[i] = next[i-1] + 1;

What if not? Change i-1 into next[next[i-1]], and use its information (that is, the position of inequality and) to do it again!

code:

private static int[] getNextArray2(char[] str22) {
        if (str22.length <= 1){
            return new int[]{-1};
        }
        //Judge whether the front n and the rear n are equal
        int[] arr = new int[str22.length];
        arr[0] = -1;
        arr[1] = 0;
        //Position of next array
        int i = 2;
        //Need a variable to jump;
        int ct = 0;
        while(i < arr.length){
            //If the previous element is equal to the last element of the longest prefix
            if (str22[i-1] == str22[ct]){
                arr[i++] = ++ct;
            }else if (ct > 0){
                ct = arr[ct];
            }else {
                arr[i++] = 0;
            }
        }
        return arr;
    }

Are the three conditional branches a bit like the three conditional branches of KMP, so KMP rarely calculates prefix array information instead of itself

Tags: Java Algorithm leetcode

Posted on Sun, 28 Nov 2021 17:40:48 -0500 by hussain