java regular expression source code analysis

java regular expression source code analysis

public class Demo {
    public static void main(String[] args) {
        String content = "2000 In May, JDK1.3,JDK1.4 and J2SE1.3 It was released one after another, and a few weeks later it won Apple company Mac OS X Support for industry standards. On 24 September 2001, J2EE1.3 release." +
                "2002 On February 26, J2SE1.4 release. since then Java The computing power of has been greatly improved, and J2SE1.3 Compared with, it is nearly 62 more%Classes and interfaces. Among these new features, a wide range of XML Support, secure sockets( Socket)Support (via SSL And TLS Agreement), brand new I/OAPI,Regular expressions, logs, and assertions." +
                "2004 On September 30, J2SE1.5 Publish, become Java Another milestone in the history of language development. To show the importance of this version, J2SE 1.5 Renamed Java SE 5.0(Build number 1.5.0)," +
                "Code name“ Tiger",Tiger Contains 1 published since 1996.0 The most significant updates since version, including generic support, automatic boxing of basic types, improved loops, enumeration types" +
                "format I/O And variable parameters.";

        Pattern compile = Pattern.compile("\\d\\d\\d\\d");
        Matcher matcher = compile.matcher(content);
        // Discovery process:
        // 1. Locate the string that meets the rules according to the specified rules, such as 2000
        // 2. After finding, record the start index group[0]=0 of the substring into the int[] groups array of the matcher object;
        // 3. At the same time, record the value of oldLast as the end index of the substring + the position value index group[1]=4. The next time you execute find, match from the recorded index
        // 4. If matcher.group(i), i exceeds the range, an index out of range exception will be reported, because getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
        // 5. The same is true when looking down. After finding it, record the start index of the substring group[0]=65 into the int[] groups array of the matcher object; at the same time, record the value of oldLast as the end index of the substring + the position value of 1, and the index is group[1]=69. The next time you execute find, match from the recorded index
        while (matcher.find()) {
            // Start matching group - source code:
            // public String group(int group) {
            //         if (first < 0)
            //             throw new IllegalStateException("No match found");
            //         if (group < 0 || group > groupCount())
            //             throw new IndexOutOfBoundsException("No group " + group);
            //         if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
            //             return null;
            //         return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
            //     }
            System.out.println("Found:" + matcher.group(0));
        }
    }
}

Discovery process:

  1. Locate the string that meets the rules according to the specified rules, such as 2000
  2. After finding, record the start index group[0]=0 of the substring into the int[] groups array of the matcher object; Debug is shown in the following figure, and the first matching is successful:
  3. At the same time, the value of the record oldLast is the position value index of the end index + 1 of the substring, and the index group[1]=4. The next time find is executed, the matching starts from the recorded index
  4. If matcher.group(i), i exceeds the range, an index out of range exception will be reported, because getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();

The second match was successful:
5. The same is true when looking down. After finding it, record the start index of the substring group[0]=65 into the int[] groups array of the matcher object; at the same time, record the value of oldLast as the end index of the substring + the position value of 1, and the index is group[1]=69. The next time you execute find, match from the recorded index

Bracketed grouping query
public class Demo {
public static void main(String[] args) {
String content = "in May 2000, JDK1.3, JDK1.4 and J2SE1.3 were released successively. A few weeks later, they were supported by Apple's Mac OS X industry standard. J2EE 1.3 was released on September 24, 2001."+
"J2SE 1.4 was released on February 26, 2002. Since then, Java's computing power has been greatly improved. Compared with J2SE 1.3, it has nearly 62% more classes and interfaces. Among these new features, it also provides extensive XML support, secure Socket support (through SSL and TLS protocols), new I/OAPI, regular expressions, logs and assertions."+
"On September 30, 2004, J2SE 1.5 was released, which became another milestone in the development history of Java language. In order to show the importance of this version, J2SE 1.5 was renamed Java SE 5.0 (build 1.5.0),"+
"Code named" Tiger ", Tiger contains the most significant updates since the release of version 1.0 in 1996, including generic support, automatic boxing of basic types, improved loops, enumeration types,"+
"Format I/O and variable parameters.";

    Pattern compile = Pattern.compile("(\\d)(\\d\\d\\d)");
    Matcher matcher = compile.matcher(content);
    // Discovery process:
    // What is grouping? For example (\ d\d)(\d\d), it will function. In the expression, () represents grouping, the first () represents group 1, and the second () represents group 2
    // 1. Locate the substring satisfying the rule according to the specified rule (e.g. (20) (00))
    // 2. After finding, record the start index group[0]=0 of the substring into the familiar int[] groups array of the matcher object;
    // *2.1 groups[0] = 0, record the end index + 1 value of the substring to groups[1] = 4
    // *2.2 substring groups[2] = 0 groups[3] = 2 matched to record 1 group ()
    // *2.3 record the substring groups[4] = 2 groups[5] = 4 matched by 2 groups ()
    // *2.4 if there are more groups, the same is true
    // 3. At the same time, record that the value of oldLast is the index + 1 value at the end of the substring, that is, 69, that is, the next time find is executed, the matching will start from 69.
    while (matcher.find()) {
        // Start matching group - source code:
        // public String group(int group) {
        //         if (first < 0)
        //             throw new IllegalStateException("No match found");
        //         if (group < 0 || group > groupCount())
        //             throw new IndexOutOfBoundsException("No group " + group);
        //         if ((groups[group*2] == -1) || (groups[group*2+1] == -1))
        //             return null;
        //         return getSubSequence(groups[group * 2], groups[group * 2 + 1]).toString();
        //     }
        System.out.println("Found:" + matcher.group(0)); // 2000
        System.out.println("Found:" + matcher.group(1)); // 2
        System.out.println("Found:" + matcher.group(2)); // 000
        // System.out.println("found:" + matcher.group(3)); index out of bounds
    }

}

}
First, understand what is grouping? For example (\ d\d)(\d\d), the function will be. In the expression, () represents grouping, the first () represents group 1, and the second () represents group 2
Discovery process:

  1. Locate the substring that satisfies the rule (such as (20) (00)) according to the specified rule
  2. After finding, record the start index group[0]=0 of the substring into the familiar int[] groups array of the matcher object;
    2.1 groups[0] = 0, record the end index + 1 value of the substring to groups[1] = 4
    2.2 substring groups[2] = 0 groups[3] = 2 matched to record 1 group ()
    2.3 record the substring groups[4] = 2 groups[5] = 4 matched by 2 groups ()

2.4 If there are more groups, the same is true
  1. At the same time, record the value of oldLast as the index + 1 at the end of the substring, that is, 69, that is, the next time find is executed, the matching will start from 69.

Tags: Java regex

Posted on Tue, 23 Nov 2021 08:51:04 -0500 by b2k