A small optimization of Sunday algorithm

Optimization algorithm idea:

1. Key idea of Sunday algorithm
By analyzing the traditional Sunday algorithm, I find that the key idea of realizing jump is the second step. Let's analyze the principle of the second step in depth:

we should working hard
work

Why do h compare elements in work one by one? We can understand as follows:
1) When h and w are compared, we can determine whether houl and work are likely to match
2) When h and o are compared, we can determine whether shou and work are likely to match
3) When h and r are compared, we can determine whether sho and work are likely to match
4) When h and k are compared, we can determine whether e sh and work are likely to match
That is to say, we can determine the relationship between e shoul and work in this comparison.
we should working hard
work

Then our optimization strategy is as follows:
When h and work are compared and found that h is not in work, we can skip oul directly, repeat h with d, and so on until we find a match.

Improved Sunday algorithm:

Let's assume that there is A string A with A length of n and A string B with A length of m
We use the Sunday algorithm to find out whether there is B in A

We assume that the time of each comparison is 1. The complexity analysis in this paper is mainly the worst complexity analysis

1) Align A, B initials

2) Check whether there are m comparisons in B for the characters on the position of A corresponding to the last character of B (initially m)

3) If it exists, align the corresponding positions in B and compare them in reverse order. If they are all equal, find the target. Otherwise, perform step 4 M-1 for comparison

4) If it does not exist, it will check whether the character at m+m position exists in B. M comparison

Diagram process:

w	e		s	h	o	u	l	d		w	o	r	k	i	n	g		h	a	r	d
w	o	r	k							

1) First, check whether s exists in work. The result is no existence. Skip 4 bits to l
2) Check whether l exists in work, the result is no, skip 4 bits to o
3) Check whether o exists in work. If it exists, align the o in the two strings to get the following:

w	e		s	h	o	u	l	d		w	o	r	k	i	n	g		h	a	r	d
										w	o	r	k								

4) Check whether their corresponding strings match. The result shows that they match and get the result. The matching is completed.

Complexity analysis
We calculate the worst-case complexity of the traditional Sunday algorithm:

Searches like the following are the most complex:
wordkkkkkkkkkkkkkkkkkkk
work

The complexity is calculated as follows:
O(n,m)=(2m-1)n/m
After simplification: O(n,m)=2n-n/m
We know that when n is much larger than m, its time complexity can be approximately seen as: O(n)=n

This is twice as fast as the traditional Sunday.

code implementation

public int Sunday(String haystack, String needle) {
        int hayLen = haystack.length();//Main string length
        int nLen = needle.length();//Substring length
        int i=nLen-1;
        int l=0;
        int j=0;

        if(hayLen>nLen)
        {
            while(i<hayLen) {
                l=i;
                for (j=nLen-1;j>=0;j--)
                {
                    if (needle.charAt(j)==haystack.charAt(i)) {
                        i += nLen - 1 - j;
                        if (i < hayLen){
                            for (int n = nLen - 1; n >= 0; n--) {
                                if (needle.charAt(n) != haystack.charAt(i--)) {
                                    break;
                                } else if (n == 0) {
                                    return i + 1;
                                }
                            }
                        i = l;
                        break;
                       }
                        else{
                            return -1;
                        }
                    }
                }
                i+=nLen;
                continue;
                }
            return -1;
        }
        else{
            System.out.println("The latter is longer than the former and cannot be searched");
            return -1;
        }

}

The implementation code is much less and more concise.

Test comparison
For the above two algorithms, I choose one of the following test methods:
1. Worst case test:

public static void main(String[] args) {
        String a="wordkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk" +
                "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk" +
                "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk" +
                "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk" +
                "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk" +
                "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk" +
                "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk" +
               "kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkwork";
        String b="work";
        BetterSunday01 find=new BetterSunday01();
        long startTime = System.currentTimeMillis();    //Get start time
        int i=0;
        for(int j=0;j<1000000;j++) {
            i = find.Sunday(a, b);
        }
        long endTime = System.currentTimeMillis();    //Get end time
        System.out.println(i);
        System.out.println("The running time of the algorithm is:"+(endTime - startTime) + "ms");
}

The results of traditional algorithm are as follows:

The optimized algorithm results are as follows:

The improved algorithm is 67.5% faster than the traditional algorithm.

2. General test:
A normal section of more than 20000 words of text is intercepted from the Internet and tested to find that their search speed is basically the same.
Generally speaking, the optimization of Sunday algorithm improves the stability and average complexity of the algorithm.

Tags: less

Posted on Fri, 26 Jun 2020 03:49:04 -0400 by gwolff2005