10.Regular Expression Matching

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

Solutions to problems

Time complexity: O(nm)
In general, string comparison can think of dynamic programming first. This problem can be solved only by calculating a matrix of n*m, so the complexity is O(nm). (N and m are the lengths of p and s, respectively)
1. The matrix of n*m i s used to record the comparison results between the substring of regular p from the beginning to the end of subscript i and the substring of target string s from the beginning to the end of subscript I.
2. Based on the elements with small subscript and the comparison characters, the elements with large subscript can be calculated to fill the whole matrix.
3. After matrix calculation, return the corresponding element.
4. How the elements of a matrix are calculated is illustrated by an example, as shown in the following figure:

Figure 1. Example

  1. Horizontal is the target string s, vertical is the regular p. Define the matrix as C
  2. Initialize the first line. When s and P are empty, the match is correct, so C[0][0] = 1. When p is an empty string and S is not empty, there must be a matching error, so other columns in the first row are all assigned to 0.
  3. Initialize the first column. The first column indicates that s is an empty string. In this case, the matching result may be true only when * appears in p, as shown in the red position of the first column in the figure. When * appears in p, either of the first two elements in the same column of the element is true, it is true. The specific reason is explained in 4., so C[2][0] = C[0][0].
  4. Calculate other elements. There are the following situations:
p[i] == p[j] || p[i] == '.': // The corresponding substring of this element matches correctly 
        C[i + 1][j + 1] = C[i][j] 
p[i] == '*': // X * means that the character x appears 0 times, 1 times and more than 1 time.
        C[i + 1][j + 1] = C[i - 1][j + 1] // When x * represents zero occurrence of X, the matching result is the same as "x *" does not appear
                                    //That is, the matching result of the current p substring jumping forward two characters and the current s substring

        C[i + 1][j + 1] = C[i][j + 1] // X * means that when x appears once, the matching result is equivalent to that there is no such asterisk,
                                      // The matching result of the substring of p with the end of the previous character and the substring of s.

        C[i + 1][j + 1] = C[i + 1][j] // X * represents the ending word of the substring of the current s when x occurs more than once
                                    // The character before * of the substring of p (don't forget the case of '.') can be matched,
                               // The current matching result is equivalent to the matching result of the p substring and the s substring ending with the character before *.

In short, when p[i] = = '*', only the reverse "L" area in the figure needs to be considered. If 1 occurs in other positions, the current position is 1, otherwise 0.

Source code of problem solving

using namespace std;

class Solution {
    bool isMatch(string s, string p) {
        int m = s.length(), n = p.length();
        vector<vector<bool>> check(n + 1, vector<bool>(m + 1, false));// Initialize all elements with false
        // Initialize the first element. The other first line elements have been initialized to false
        check[0][0] = true;
        // Initialize first column
        for (int i = 0; i <n; ++i) {
            if (p[i] == '*') {
                check[i + 1][0] = i >= 1 && check[i - 1][0] || i >= 0 && check[i][0];
        // Calculate other elements
        for (int i = 0; i < n; ++i) {
            for (int j = 0; j < m; ++j) {
                if (p[i] == '.' || p[i] == s[j]) check[i + 1][j + 1] = check[i][j];
                else if (p[i] == '*') {
                    check[i + 1][j + 1] = check[i - 1][j + 1] || check[i][j + 1] || (p[i - 1] == s[j] || p[i - 1] == '.') && check[i + 1][j];
        // Return results
        return check[n][m];

Reference resources


Tags: Programming Asterisk

Posted on Sun, 03 May 2020 15:33:05 -0400 by help_lucky