Regular expression note 2: Fundamentals of regular expression syntax (JavaScript)

Character description

Meta character

Meta character describe
. Find a single character, except for line breaks and line terminations
\w Find word characters
\W Find non word characters
\d Find number
\D Find non numeric characters
\s Find blank characters
\S Find non blank characters
\b Find word boundaries
\B Find non word boundaries
\0 Find NUL characters
\n Find line breaks
\f Find page breaks
\r Find carriage return
\f Find tab
\v Find vertical tabs
\xxx Find characters specified in octal number XXX
\xdd Find the character specified by the hexadecimal number dd
\uxxxx Find Unicode characters specified in hex xxxx
var r=/\x61/;
var s="javascript";
var a=s.match(r);
alert(a);//a
var r=/\s/;
var s="javascript Java";
var a=r.test(s);
alert(a);//true
var r=/\141/;//Octal number system
var s="javascript Java";
var a=s.match(r);
alert(a);//a
var r=/\u0061/;
var s="javascript Java";
var a=s.match(r);
alert(a);//a

character in range

In regular expression syntax, square brackets are used to find characters in a specific range. Only start and end characters are specified in square brackets, and the middle part is represented by a hyphen (-). If a caret (^) prefix is added within square brackets, it means that characters other than the scope are defined.

  • [abc]: finds any character between square brackets.
  • [^ abc]: finds any characters that are not between square brackets.
  • [0-9]: find any number from 0-9, that is, find any number.
  • [a-z]: find any small to lowercase z characters, that is, find any lowercase letters.
  • [A-Z]: find any uppercase A to uppercase Z characters, that is, find any uppercase letters.
  • [A-z]: find any character from uppercase a to lowercase z, that is, find any form of letter.
  • [adgk]: finds any characters in the given set.
  • [^ adgk]: finds any characters other than the given set.

Match any ASCII characters

var r=/[\u0000-\u00ff\/g;

Match any double byte Chinese characters

var r=/[^\u0000-\u00ff\/g;

Match any capital letter

var r=/[\u0041-\u004A]/g;

Match any lowercase letter

var r=/[\u0061-\u007A]/g;

Match any case letters and numbers

var r=/[a-zA-Z0-9]/g;

Selection operation

Use the vertical bar (|) description to indicate any one of the matching results of the two subpatterns.

var s1="abc";
var s2="123";
var r=/\w+|\d+/;//Select repeating character class
var b1=r.test(s1);
var b2=r.test(s2);
alert(b1);//true
alert(b2);//true
var s1="abc";
var s2="123";
var s3="def";
var s4="456";
var r=/(abc)|(123)|(def)|(456)/;//Select repeating character class
var b1=r.test(s1);
var b2=r.test(s2);
var b3=r.test(s3);
var b4=r.test(s4);
alert(b1);//true
alert(b2);//true
alert(b3);//true
alert(b4);//true
var s="a'b?c&";
var r=/\'|\"|\?|\&/gi;//Regular expressions for filtering sensitive characters
function f(){
	return "&#"+arguments[0].charCodeAt(0)+";";
}
var a=s.replace(r,f);
document.write(a);//a'b?c&
alert(a);//a'b?c&

Repeated quantifier

List of repeated quantifiers

Classifier describe
n+ Match any string containing at least one n
n* Matches any string containing zero or more n
n? Match any string containing zero or one n
n{x} Matches a string containing a sequence of x n's
n{x,y} Matches a string containing a sequence of x or y n's
n{x,} Matches a string containing a sequence of at least x n
var s="ggle gogle google gooogle gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go?gle/g;
var a=s.match(r);
alert(a);//ggle,gogle
var s="ggle gogle google gooogle gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go{0,1}gle/g;
var a=s.match(r);
alert(a);//ggle,gogle
var s="ggle gogle google gooogle gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go{3}gle/g;
var a=s.match(r);
alert(a);//gooogle,gooogle
var s="ggle gogle google gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/gooogle/g;
var a=s.match(r);
alert(a);//gooogle
var s="ggle gogle google gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go{3,5}gle/g;
var a=s.match(r);
alert(a);//gooogle,goooogle,gooooogle
var s="ggle gogle google gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go*gle/g;
var a=s.match(r);
alert(a);//ggle,gogle,google,gooogle,goooogle,gooooogle,goooooogle,gooooooogle,goooooooogle
var s="ggle gogle google gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go{0,}gle/g;
var a=s.match(r);
alert(a);//ggle,gogle,google,gooogle,goooogle,gooooogle,goooooogle,gooooooogle,goooooooogle
var s="ggle gogle google gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go+gle/g;
var a=s.match(r);
alert(a);//gogle,google,gooogle,goooogle,gooooogle,goooooogle,gooooooogle,goooooooogle
var s="ggle gogle google gooogle goooogle gooooogle goooooogle gooooooogle goooooooogle";
var r=/go{1,}gle/g;
var a=s.match(r);
alert(a);//gogle,google,gooogle,goooogle,gooooogle,goooooogle,gooooooogle,goooooooogle

Inertia mode

Repeated classifiers are greedy, and they match as many characters as possible if conditions permit.

  • ? The repeated classifiers of {n} and {n,m} are greedy, which shows the finiteness of greedy.
  • *The repeated quantifiers of, + and {n,} have strong greedy, which is shown as infinite greedy.

The more on the left, the higher the matching priority of the repeated classifiers.

var s="<html><head><title></title></head><body></body></html>";
var r=/(<.*>)(<.*>)/;
var a=s.match(r);
alert(a[1]);//<html><head><title></title></head><body></body>
alert(a[2]);//</html>

In contrast to greedy matching, lazy matching matches as few characters as possible on the premise of satisfying the conditions.
The inert matching method is defined, and the English question mark (?) suffix is added after the repeated quantifier.

var s="<html><head><title></title></head><body></body></html>";
var r=/<.*?>/;
var a=s.match(r);
alert(a);//<html>

A simple description of inert matching for six kinds of repetition classes:

  • {n, m}?: try to match n times, but in order to meet the conditions, it may repeat m times at most.
  • {n} ?: try to match n times.
  • {n,}?: try to match n times, but in order to meet the constraints, you may also match any times.
  • ??: try to match as much as possible, but in order to meet the limited conditions, it is possible to match at most once, which is equivalent to {0,1}?.
  • +? : try to match once, but in order to meet the limited conditions, you can also match any times, which is equivalent to {1,}?.
  • *?: try not to match as much as possible, but in order to meet the limited conditions, you may also match any number of times, which is equivalent to {0,}?.
var s="<html><head><title></title></head><body></body></html>";
var r=/<.*>?/;
var a=s.match(r);
alert(a);//<html><head><title></title></head><body></body></html>

Boundary classifier

The boundary is to determine the location of the matching pattern, such as the head and tail of the string.
Boundary element characters supported by JavaScript regular expressions

Classifier Explain
^ Match the beginning of a line in multi line detection
$ Match the end of a line in multi line detection
var s="how are you";
var r=/(?:\w+)$/;
var a=s.match(r);
alert(a);//you
var s="how are you";
var r=/^(?:\w+)/;
var a=s.match(r);
alert(a);//how
var s="how are you";
var r=/(?:\w+)/g;
var a=s.match(r);
alert(a);//how,are,you

Declaring quantifier

Declarative quantifiers include forward declaration and reverse declaration.

Forward declaration: the declaration means that the characters after the specified matching pattern must be matched, but the matching result will not be returned. The forward declaration uses "? = matching condition.

var s="a:123 b=345";
var r=/\w*(?==)/;
var a=s.match(r);
alert(a);//b

Reverse declaration: Specifies that none of the following characters need to match. Declare "(?! matching condition" in reverse to indicate.

var s="a:123 b=345";
var r=/\w*(?!=)/;
var a=s.match(r);
alert(a);//a

Note: declarations are not grouped, although they are enclosed in parentheses. JavaScript only supports forward declarations, not reverse declarations.

Expression grouping

The regular expression string can be grouped arbitrarily by using the parenthesis operator. The string in the parenthesis represents the sub expression, or called the sub pattern. The sub expression has independent matching function and the matching result is also independent. Quantifiers that follow parentheses at the same time act on the entire subexpression.
In regular expressions, expression grouping has a very high application value.

var s="javascript is not java";
var r=/java(script)?/g;
var a=s.match(r);
alert(a);//javascript,java
var s="ab=21,bc=45,cd=43";
var r=/(\w+)=(\d*)/;
var a=s.match(r);
alert(a);//ab=21,ab,21
var s="<h1>title<h1><p>text<p>";
var r=/(<\/?\w+>).*\1/g;
var a=s.match(r);
alert(a);//<h1>title<h1>,<p>text<p>
var s="<h1>title<h1><p>text<p>";
var r=/((<\/?\w+>).*\2)/g;
var a=s.match(r);
alert(a);//<h1>title<h1>,<p>text<p>
var s="<h1>title</h1><p>text</p>";
var r=/((<\/?\w+>).*\2)/g;
var a=s.match(r);
alert(a);//null
var s="<h1>title</h1><p>text</p>";
var r=/((<\/?\w+>).*((<\/?\w+>)))/g;
var a=s.match(r);
alert(a);//<h1>title</h1><p>text</p>

Subexpression reference

When the regular expression performs the matching operation, the expression evaluation will automatically store the matched text of each group (sub expression) temporarily for future use. These special values stored in a group are called reverse references. Reverse references are created and numbered from left to right, based on the order of the open bracket characters in the expression.

var s="abcdefghijklmn";
var r=/(a(b(c)))/;
var a=s.match(r);
alert(a);//abc,abc,bc,c

Reverse reference mainly includes the following general usage in application development.

var s="abcdefghijklmn";
var r=/(\w)(\w)(\w)/;
r.test(s);
alert(RegExp.$1);//a
alert(RegExp.$2);//b
alert(RegExp.$3);//c
var s="abcbcacba";
var r=/(\w)(\w)(\w)\2\3\1\3\2\1/;
var b=r.test(s);
alert(b);//true
var s="aa11bb22c3d4e5f6";
var r=/(\w+?)(\d+)/g;
var b=s.replace(r,"$2$1");
alert(b);//11aa22bb3c4d5e6f

Note: regular expression grouping will occupy a certain amount of system resources. In longer regular expressions, storing reverse references will reduce the matching speed. However, in many cases, grouping is only used to set the operation unit, not to reference. In this case, if a non reference grouping is selected, a reverse reference will not be created.

var s1="abc";
var s2="123";
var r=/(?:\w*?)|(?:\d*?)/;//Non reference group
var a=r.test(s1);
var b=r.test(s2);
alert(a);//true
alert(b);//true
91 original articles published, 25 praised, 30000 visitors+
Private letter follow

Tags: Google Javascript Java ascii

Posted on Sat, 11 Jan 2020 09:00:03 -0500 by zrueda