python regular expression, basic introduction

python regular

1.re.match(pattern,string,flags=0)

args:

  • pattern: matching method
  • String: string to match
  • flags: various formats, details Add link description
import re
print(re.match('www','www.baidu.com'))
print(re.match('www','www.baidu.com').span())

  • Matching method: it starts from the first of the target string, which is not at the front and cannot be matched. The matched to will return an object, otherwise it will return None, where span is the method and returns the subscript of the matched target string.
  • ps:match is not. It can't be used. An error will be reported

2.re.search(pattern, string, flags=0)

print(re.search(r'er','nerver'))
print(re.search(r'er\b','never'))
print(re.search(r'er\b','erverd'))
print(re.search(r'a\b','dbc dsa da'))
print(re.search(r'er\B','erver'))

  • Matching method: different from match, it is only possible to match one in the target string, and return if it can match only one, otherwise None
  • ps: Here's a regular expression pattern
  • \b: Match the back boundary of the string, only the back boundary
  • \B: Except for the back boundary, the target strings match

3.re.findall(string[, pos[, endpos]])

It's a bit inaccurate to implement here. It's used re.compile , later on

print(re.findall(r'sunck','sunck is a good sunck'))
print(re.findall(r'a?','asaa')) # Only one match at a time. If there is a match, there is no match
print(re.findall(r'a*','asaa')) # Try to match each time as much as possible
print(re.findall(r'a+','asaa')) # Only when there is one, there can be more than one
print(re.findall(r'a{3}','aaasaa'))
print(re.findall(r'((s|S)unck)','sunck--Sunck'))
print(re.findall(r'//*.*/*/',r'/* 1231231 asdasrasd */'))

  • Matching method: match all the matching strings in the target string and return them in the form of a list.
  • ps: regular expression mode
  • ?: matching mode, matching 0 or 1, non greedy mode. If only one match is made at a time, the matching string (a) will be matched, and the non matching string will be '', and the consecutive matching (non greedy) will not be selected. In the example, if it is s, it will be printed as "', and if there are consecutive A's, there will also be two A's in the list
  • *: matching mode: 0 or more greedy modes are matched. Relative to?, continuous ones such as' aa 'can be matched.
  • +: match mode, match 1 or more, greedy pattern, relative to *, the matching cannot be replaced by ''.
  • The last one: the purpose is to read the string in the comment, which may be a bit convoluted and clear, * represents all strings except space in the regular. Here, we do not want to use its expression mode, but simply as a character. /*It's *. If you understand, you will understand.

4.re.split(pattern, string[, maxsplit=0, flags=0])

print(re.split('\W+', 'runoob, runoob, runoob.'))
print(re.split('(\W+)', 'runoob, runoob, runoob.'))
print(re.split('\W+', ' runoob, runoob, runoob.', 1) )
print(re.split('\s+', ' runoob,          runoob, runoob.') )

  • split is to cut the matched ones and return the rest as an array. Like the first one
  • The concept of grouping is used here. After use, the cut fields are also saved and placed in the array. It is not clear why the group is not the same after adding the group, and then dig deep.
  • In the third one, a maxplit parameter is added, so only one cut is made. The default is 0, unlimited times
  • re.split be relative to str.split The advantage of this method is that it has no limit on length. For example, in 4, multiple spaces are cut off

5.re.finditer(pattern, string, flags=0)

obj = re.finditer(r'good?','goodsunck is a good good man')
print(obj)
for item in obj:
    print(item)

  • The matching method is the same as findall, except that the returned things are not the same. What is returned is an iteratable object, which is printed out using the for loop.

6.re.sub(pattern, repl, string, count=0, flags=0)

# matching#All characters after and replace with the empty character ''
print(re.sub(r'#.*$', "", '2004-959-559 # This is a foreign phone number '))
# Delete all characters except the number at
print(re.sub(r'\D', "", '2004-959-559 # This is a foreign phone number '))

  • Purpose: to replace some characters on the match in the string, similar to split is used here.

7.re.subn(pattern, repl, string, count=0, flags=0)

# matching#After all the characters, and replace with the null character ''
print(re.subn(r'#.*$', "", '2004-959-559 # This is a foreign phone number '))
# Delete all characters except numbers
print(re.subn(r'\D', "", '2004-959-559 # This is a foreign phone number '))

  • Talk about peace re.sub The difference is that there is an extra number in the returned result, which represents the number of times that the match has been modified. Relatively speaking, it can be seen more clearly

8.re.compile(pattern[, flags])

import re
obj = re.compile('www')
print(obj.match('www.baidu.com'))

  • This function is equivalent to generating an object with a fixed regular expression, which then uses the other regular methods mentioned above. Only match is used here, others can also be used

9.group

matchObj = re.match( r'(.*) are (.*?) .*', 'Cats are smarter than dogs', re.M|re.I)
 
print('group',matchObj.group()) # Equivalent to group(0)
print('group1',matchObj.group(1))
print('group2',matchObj.group(2))
print('groups',matchObj.groups())

  • You can extract substrings from the function.
  • group(): or group(0), to get the whole string matched.
  • group(1): get the string in the first group, here are Cats
  • group(2): get the string in the second group, here is smarter
  • groups(): returns all grouped strings in the form of a primitive

A few small tasks of hand training

# Whether the regular matching qq, mailbox and user name meet the requirements
import re
def checkQQ(string):
    pat = r'^[1-9]\d{5,9}$'
    print(re.search(pat,string))

checkQQ('23112541')
checkQQ('23112541')
checkQQ('231a12541')
checkQQ('231125412312')

def checkEmail(string):
    pat = r'^[1-9a-zA-Z_]\w{5,12}@\w{2,4}.com$'
    print(re.search(pat,string))
checkEmail('1175dasd761@qq.com')

def checkUser(string):
    pat = r'^\w\w{5,11}$'
    print(re.search(pat,string))

checkUser('padsaes')

  • ps: not necessarily the optimal solution, just as a solution
    There are many patterns of regular expressions, pattern Let's just mention it here
  1. ^: beginning of string
  2. $: end of string
  3. .: match any string, usually*
  4. Re {n, m}: represents the optional length of a pattern, at least N, at most M. There can be only one n < = M

End and scatter flowers

Tags: Python REST

Posted on Mon, 29 Jun 2020 00:36:02 -0400 by davidjwest