Chapter 5 grep and regular expression of three swordsmen in Linux text processing

3. Text processing three swordsmen

  • The grep command mainly filters the (regular expression) lines of text based on patterns
  • sed: stream editor, text editing tool
  • awk: implementation gawk on Linux, text report generator

3.1 grep of three swordsmen in text processing

grep: Global search REgular expression and Print out the line

Function: a text search tool to check the target text line by line according to the "pattern" specified by the user; Print matched rows

Pattern: filter conditions written by regular expression characters and text characters

Format:

grep [OPTIONS] PATTERN [FILE...]

Common options:

--color=auto Shade the matched text
-m # matching#Stop after times
-v Display not pattern Matched rows
-i Ignore character case
-n Show matching line numbers
-c Count the number of matched rows
-o Show only matching strings
-q Silent mode, no information output
-A # after, after#that 's ok
-B # before, front#that 's ok
-C # context, Front and back#that 's ok
-e Implement logic between multiple options or relationship,For example: grep –e 'cat' -e 'dog' file
-w Match entire word
-E use ERE,amount to egrep
-F Regular expressions are not supported, equivalent to fgrep
-f file Processing according to schema file
-r Recursive directories, but does not handle soft links
-R Recursive directories, but handling soft links

example:

[root@rocky8 ~]# grep rocky
i am raymond
rocky8
rocky8
i am study rocky8 linux
i am study rocky8 linux
^C
#By default, wait for input. If the input contains characters, the whole line will be printed and the characters contained will be marked in red

[root@rocky8 ~]# cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin
systemd-resolve:x:193:193:systemd Resolver:/:/sbin/nologin
tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin
polkitd:x:998:996:User for polkitd:/:/sbin/nologin
unbound:x:997:994:Unbound DNS resolver:/etc/unbound:/sbin/nologin
sssd:x:996:993:User for sssd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
raymond:x:1000:1000::/home/raymond:/bin/bash
boss:x:1001:1001::/home/boss:/bin/bash
[root@rocky8 ~]#  cat /etc/passwd |grep root
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
#Receive standard input through the pipeline and find the line containing the characters

[root@rocky8 ~]# grep root /etc/passwd #grep itself supports trailing files
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

[root@rocky8 ~]# cut -d: -f1 /etc/passwd | grep root
root
#Filter the execution results of the previous commands with grep

[root@rocky8 ~]# pstree -p |grep bash
           |-sshd(750)---sshd(838)---sshd(979)---bash(998)-+-grep(1389)
           
[root@rocky8 ~]# grep root /etc/passwd /etc/group
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/group:root:x:0:
#Multiple files can be processed

[root@rocky8 ~]# alias grep
alias grep='grep --color=auto' #The command alias defines the color

[root@rocky8 ~]# \grep root /etc/passwd /etc/group
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/group:root:x:0:
#The original command does not automatically add color

[root@rocky8 ~]# which grep
alias grep='grep --color=auto'
	/usr/bin/grep
[root@rocky8 ~]# /usr/bin/grep root /etc/passwd /etc/group
/etc/passwd:root:x:0:0:root:/root:/bin/bash
/etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin
/etc/group:root:x:0:
#You can also use the path to use the original command, but if the internal command has no path, you can't use the path. This method has some limitations

[root@rocky8 ~]# grep bin /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin
systemd-resolve:x:193:193:systemd Resolver:/:/sbin/nologin
tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin
polkitd:x:998:996:User for polkitd:/:/sbin/nologin
unbound:x:997:994:Unbound DNS resolver:/etc/unbound:/sbin/nologin
sssd:x:996:993:User for sssd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
postfix:x:89:89::/var/spool/postfix:/sbin/nologin
raymond:x:1000:1000::/home/raymond:/bin/bash
boss:x:1001:1001::/home/boss:/bin/bash
[root@rocky8 ~]# grep -m 3 bin /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
#-m you can specify to view the first few lines that contain the string

[root@rocky8 ~]# grep -v nologin /etc/passwd
root:x:0:0:root:/root:/bin/bash
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
raymond:x:1000:1000::/home/raymond:/bin/bash
boss:x:1001:1001::/home/boss:/bin/bash
#-v displays lines that do not contain strings

[root@rocky8 ~]# grep -n nologin /etc/passwd
2:bin:x:1:1:bin:/bin:/sbin/nologin
3:daemon:x:2:2:daemon:/sbin:/sbin/nologin
4:adm:x:3:4:adm:/var/adm:/sbin/nologin
5:lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
9:mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
10:operator:x:11:0:operator:/root:/sbin/nologin
11:games:x:12:100:games:/usr/games:/sbin/nologin
12:ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
13:nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
14:dbus:x:81:81:System message bus:/:/sbin/nologin
15:systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin
16:systemd-resolve:x:193:193:systemd Resolver:/:/sbin/nologin
17:tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin
18:polkitd:x:998:996:User for polkitd:/:/sbin/nologin
19:unbound:x:997:994:Unbound DNS resolver:/etc/unbound:/sbin/nologin
20:sssd:x:996:993:User for sssd:/:/sbin/nologin
21:sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
22:postfix:x:89:89::/var/spool/postfix:/sbin/nologin
#-n displays the line number containing the string

[root@rocky8 ~]# grep -c nologin /etc/passwd
18
#-c displays the number of times a string line is included

[root@rocky8 ~]# grep -o nologin /etc/passwd
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
nologin
[root@rocky8 ~]# grep -o nologin /etc/passwd | wc -l
18
#-o display only strings containing

[root@rocky8 ~]# grep -no nologin /etc/passwd
2:nologin
3:nologin
4:nologin
5:nologin
9:nologin
10:nologin
11:nologin
12:nologin
13:nologin
14:nologin
15:nologin
16:nologin
17:nologin
18:nologin
19:nologin
20:nologin
21:nologin
22:nologin

[root@rocky8 ~]# grep -q root /etc/passwd #-q is not displayed if it is found or not found
[root@rocky8 ~]# echo $?
0 #Found with return value 0
[root@rocky8 ~]# grep -q rooter /etc/passwd
[root@rocky8 ~]# echo $?
1 #Not found. Return value is 1
#Through $? Return value to judge whether it is found or not

[root@rocky8 ~]# grep  root /etc/passwd &> /dev/null
[root@rocky8 ~]# echo $?
0
[root@rocky8 ~]# grep  rooter /etc/passwd &> /dev/null
[root@rocky8 ~]# echo $?
1
#You can also judge by throwing it into the dustbin

[root@rocky8 ~]# grep -n root /etc/passwd
1:root:x:0:0:root:/root:/bin/bash
10:operator:x:11:0:operator:/root:/sbin/nologin
[root@rocky8 ~]# grep -nA3 root /etc/passwd
1:root:x:0:0:root:/root:/bin/bash
2-bin:x:1:1:bin:/bin:/sbin/nologin
3-daemon:x:2:2:daemon:/sbin:/sbin/nologin
4-adm:x:3:4:adm:/var/adm:/sbin/nologin
--
10:operator:x:11:0:operator:/root:/sbin/nologin
11-games:x:12:100:games:/usr/games:/sbin/nologin
12-ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
13-nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
#-A displays the last few lines containing the string

[root@rocky8 ~]# grep -nB3 root /etc/passwd
1:root:x:0:0:root:/root:/bin/bash
--
7-shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
8-halt:x:7:0:halt:/sbin:/sbin/halt
9-mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
10:operator:x:11:0:operator:/root:/sbin/nologin
#-B displays the first few lines containing the string

[root@rocky8 ~]# grep -nC3 root /etc/passwd
1:root:x:0:0:root:/root:/bin/bash
2-bin:x:1:1:bin:/bin:/sbin/nologin
3-daemon:x:2:2:daemon:/sbin:/sbin/nologin
4-adm:x:3:4:adm:/var/adm:/sbin/nologin
--
7-shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
8-halt:x:7:0:halt:/sbin:/sbin/halt
9-mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
10:operator:x:11:0:operator:/root:/sbin/nologin
11-games:x:12:100:games:/usr/games:/sbin/nologin
12-ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
13-nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
#-C displays the first and last lines containing the string

[root@rocky8 ~]# grep -e root -e bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
raymond:x:1000:1000::/home/raymond:/bin/bash
boss:x:1001:1001::/home/boss:/bin/bash
#-e can add multiple filter conditions, indicating that it contains root or bash

[root@rocky8 ~]# grep root /etc/passwd | grep bash
root:x:0:0:root:/root:/bin/bash
#Also, filter again with a pipe

[root@rocky8 ~]# echo hello |grep -w hello
hello
[root@rocky8 ~]# echo helloeveryone |grep -w hello
[root@rocky8 ~]# echo hello:everyone |grep -w hello
hello:everyone
[root@rocky8 ~]# echo hello,everyone |grep -w hello
hello,everyone
[root@rocky8 ~]# echo hello-everyone |grep -w hello
hello-everyone
[root@rocky8 ~]# echo hello_everyone |grep -w hello
[root@rocky8 ~]# echo hello2everyone |grep -w hello
[root@rocky8 ~]# echo hello everyone |grep -w hello
hello everyone
#-w matches words, consecutive letters, numbers_ Underline is a word, and other character segmentation is not a word

[root@rocky8 ~]# cd /data
[root@rocky8 data]# cat >f1.txt <<EOF
raymond
boss
EOF
[root@rocky8 data]# cat f1.txt
raymond
boss
[root@rocky8 data]# cat >f2.txt <<EOF
I am oldraymond
bigboss
I love linux
I am linux student
EOF
[root@rocky8 data]# cat f2.txt
I am oldraymond
bigboss
I love linux
I am linux student

[root@rocky8 data]# grep -f f1.txt f2.txt
I am oldraymond
bigboss
#-f take the first file as the filter condition to see if the second file contains the string of the first file. If so, the whole line will be displayed

Example: take the same line of two files

[root@rocky8 data]# cat test1.txt
a
b
1
c
[root@rocky8 data]# cat test2.txt
b
e
f
c
1
2
[root@rocky8 data]# grep -f test1.txt test2.txt 
b
c
1

example:

[root@rocky8 data]# ls -R /etc

[root@rocky8 data]# grep -r root /etc #-r recursively view the lines containing strings in the directory, and do not process soft links

[root@rocky8 data]# grep -R root /etc #-R recursively looks at the lines of the directory that contain strings, but handles soft links

example:

[root@rocky8 data]# grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@rocky8 data]# grep "USER" /etc/passwd
[root@rocky8 data]# grep 'USER' /etc/passwd
[root@rocky8 data]# grep whoami /etc/passwd

example:

[root@rocky8 data]# df | grep '^/dev/sd' |tr -s ' ' %|cut -d% -f5|sort -n|tail -1
19

example:

[root@rocky8 data]# grep "^ESTAB" ss2.log |tr -s ' ' : |cut -d: -f6|sort |uniq -c|sort -nr|head -n3
     12 223.88.255.148
     10 183.202.63.36
      9 117.152.155.119

example:

[root@rocky8 data]# grep -v "^#" /etc/profile | grep -v '^$'
pathmunge () {
    case ":${PATH}:" in
        *:"$1":*)
            ;;
        *)
            if [ "$2" = "after" ] ; then
                PATH=$PATH:$1
            else
                PATH=$1:$PATH
            fi
    esac
}
if [ -x /usr/bin/id ]; then
    if [ -z "$EUID" ]; then
        # ksh workaround
        EUID=`/usr/bin/id -u`
        UID=`/usr/bin/id -ru`
    fi
    USER="`/usr/bin/id -un`"
    LOGNAME=$USER
    MAIL="/var/spool/mail/$USER"
fi
if [ "$EUID" = "0" ]; then
    pathmunge /usr/sbin
    pathmunge /usr/local/sbin
else
    pathmunge /usr/local/sbin after
    pathmunge /usr/sbin after
fi
HOSTNAME=`/usr/bin/hostname 2>/dev/null`
HISTSIZE=1000
if [ "$HISTCONTROL" = "ignorespace" ] ; then
    export HISTCONTROL=ignoreboth
else
    export HISTCONTROL=ignoredups
fi
export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL
if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then
    umask 002
else
    umask 022
fi
for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then 
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done
unset i
unset -f pathmunge
if [ -n "${BASH_VERSION-}" ] ; then
        if [ -f /etc/bashrc ] ; then
                # Bash login shells run only /etc/profile
                # Bash non-login shells run only /etc/bashrc
                # Check for double sourcing is done in /etc/bashrc.
                . /etc/bashrc
       fi
fi

[root@rocky8 data]# grep -v "^#\|^$" /etc/profile

[root@rocky8 data]# grep "^[^#]" /etc/profile

[root@rocky8 data]# grep -v "^\(#\|$\)" /etc/profile

[root@rocky8 data]# grep -Ev "^(#|$)" /etc/profile

[root@rocky8 data]# egrep -v "^(#|$)" /etc/profile

[root@centos6 ~]# egrep -v '^(#|$)' /etc/httpd/conf/httpd.conf

example:

[root@rocky8 data]# grep -o 'r..t' /etc/passwd
root
root
root
root
r/ft

example:

[root@rocky8 data]# ifconfig | grep -E '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}'
        inet 172.31.1.8  netmask 255.255.248.0  broadcast 172.31.7.255
        RX packets 9744  bytes 11614557 (11.0 MiB)
        inet 127.0.0.1  netmask 255.0.0.0

[root@rocky8 data]# ifconfig | grep -E '([0-9]{1,3}.){3}[0-9]{1,3}'
        inet 172.31.1.8  netmask 255.255.248.0  broadcast 172.31.7.255
        RX packets 9752  bytes 11615406 (11.0 MiB)
        inet 127.0.0.1  netmask 255.0.0.0

[root@rocky8 data]# ifconfig eth0 | grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}'|head -1
172.31.1.8

[root@rocky8 data]# cat > regex.txt
([0-9]{1,3}\.){3}[0-9]{1,3}
^C
[root@rocky8 data]# ifconfig | grep -oEf regex.txt
172.31.1.8
255.255.248.0
172.31.7.255
127.0.0.1
255.0.0.0

example:

[root@rocky8 data]# grep -E 'root|bash' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
raymond:x:1000:1000::/home/raymond:/bin/bash
boss:x:1001:1001::/home/boss:/bin/bash

example:

[root@rocky8 data]# grep -w root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

[root@rocky8 data]# grep '\<root\>' /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

example:

[root@rocky8 data]# grep "^\(.*\)\>.*\<\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

[root@rocky8 data]# grep -E "^(.*)\>.*\<\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

[root@rocky8 data]# egrep "^(.*)\>.*\<\1$" /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

Example: the interview question, calculate the sum of everyone's ages

[root@rocky8 data]# cat > nianling.txt
xiaoming=20
xiaohong=18
xiaoqiang=22
^C

[root@rocky8 data]# cut -d"=" -f2 nianling.txt|tr '\n' + | grep -Eo ".*[0-9]"|bc
60

[root@rocky8 data]# grep -Eo "[0-9]+" nianling.txt | tr '\n' + | grep -Eo ".*[0-9]"|bc
60

4. Regular expression

REGEXP: Regular Expressions is a pattern written by a class of special characters and text characters. Some characters (metacharacters) do not represent the literal meaning of the characters, but represent the function of control or wildcard, which is similar to the enhanced wildcard function, but different from the wildcard, the wildcard function is used to process file names, and the regular expression is used to process characters in text content

Regular expressions are widely supported by many programs and development languages: vim, less,grep,sed,awk, nginx,mysql, etc

Regular expressions fall into two categories:

  • Basic regular expression: BRE
  • Extended regular expressions: ERE

Regular expression engine:

Use different algorithms to check the software modules dealing with regular expressions, such as PCRE (Perl Compatible Regular Expressions)

Metacharacter classification of regular expressions: character matching, matching times, position anchoring and grouping

Help: man 7 regex

4.1 basic regular expression metacharacters

4.1.1 character matching

. Match any single character(except\n),It can be a Chinese character or the characters of other countries
[] Matches any single character within the specified range, for example:[wang] [0-9] [a-z] [a-zA-Z]
[^] Matches any single character outside the specified range,Example:[^wang]
[:alnum:] Letters and numbers
[:alpha:] Represents any English upper and lower case characters, i.e A-Z, a-z
[:lower:] Lowercase letters,Example:[[:lower:]],amount to[a-z]
[:upper:] capital
[:blank:] White space characters (spaces and tabs)
[:space:] Including spaces, tabs(Horizontal and vertical),Various types of blanks such as line feed, carriage return, etc,than[:blank:]It covers a wide range
[:cntrl:] Non printable control characters (backspace, delete, alarm)...)
[:digit:] Decimal digit
[:xdigit:]Hexadecimal digit
[:graph:] Printable non white space characters
[:print:] Printable character
[:punct:] punctuation
\s #Matches any whitespace characters, including spaces, tabs, page breaks, etc. equivalent to [\ f\r\t\v]. Note that Unicode regular expressions match full width whitespace characters
\S #Matches any non whitespace character. Equivalent to [^ \ f\r\t\v]
\w #Match a letter, number, underline, Chinese character, character of other countries, equivalent to [[: alnum:] character]
\W #Match a character other than letters, numbers, underscores, Chinese characters and other national characters, equivalent to [^ [: alnum:] characters]

example:

[root@rocky8 data]# ls
aa.txt      alpha.log  c      emp.txt  f1.txt.orig  fa.txt   f.txt         pass.txt  regex.txt  ss2.log    test2.txt   title.txt
access_log  a.txt      c.txt  f1.txt   f2.txt       f.patch  nianling.txt  passwd    seq.log    test1.txt  title1.txt
[root@rocky8 data]# ls |grep 'f..txt'
f1.txt
f1.txt.orig
f2.txt
fa.txt
[root@rocky8 data]# touch faatxt
[root@rocky8 data]# touch fbbtxt
[root@rocky8 data]# ls |grep 'f.\.txt'
f1.txt
f1.txt.orig
f2.txt
fa.txt
#. represents a character. To represent f1.txt, it is necessary to escape the following. It is not possible to use f..txt, but f.\.txt

[root@rocky8 data]# ls |grep f.\.txt
f1.txt
f1.txt.orig
f2.txt
faatxt
fa.txt
fbbtxt
#Regular expressions should be enclosed in '' single quotation marks, otherwise the use of regular expressions will not take effect
#grep treats the result from the previous pipeline as a string

[root@rocky8 data]# ls f?.txt
f1.txt  f2.txt  fa.txt
#You can also use wildcards

example:

[root@rocky8 data]# ls /etc/ | grep 'rc[.0-6]'
rc0.d
rc1.d
rc2.d
rc3.d
rc4.d
rc5.d
rc6.d
rc.d
rc.local

[root@rocky8 data]# ls /etc/ | grep 'rc[.0-6].'
rc0.d
rc1.d
rc2.d
rc3.d
rc4.d
rc5.d
rc6.d
rc.d
rc.local

[root@rocky8 data]# ls /etc/ | grep 'rc[.0-6]\.'
rc0.d
rc1.d
rc2.d
rc3.d
rc4.d
rc5.d
rc6.d

4.1.2 matching times

Used after the character to specify the number of times. It is used to specify the number of times the preceding character will appear

* Match the preceding characters any time, including 0 times. Greedy mode: match as long as possible, such as: a* express a Any number of times
.* Any character of any length
\? Matches the character before it 0 or 1 times,Namely:not essential
\+ Matches the character preceding it at least once,Namely:There must be and >=1 second
\{n\} Match previous characters n second,For example: a\{10\}
\{m,n\} Match previous characters at least m Times, at most n second
\{,n\} Match previous characters at most n second,<=n
\{n,\} Match previous characters at least n second

example:

[root@rocky8 data]# ls f*
f1.txt  f1.txt.orig  f2.txt  faatxt  fa.txt  fbbtxt  f.patch  f.txt
#In the wildcard, * asterisk indicates any string

[root@rocky8 data]# echo aa |grep "a*"
aa
[root@rocky8 data]# echo aaa |grep "a*"
aaa
[root@rocky8 data]# echo aaaaaaa |grep "a*"
aaaaaaa
#In a regular expression, * asterisk indicates that the previous string appears any time

[root@rocky8 data]# echo b |grep "a*"
b
#0 occurrences are also consistent

[root@rocky8 data]# echo b |grep "aa*"
[root@rocky8 data]# echo a |grep "aa*"
a
#Indicates that the string appears once or any time, preceded by another character

[root@rocky8 data]# echo aaaaaaaabbb |grep "aa*"
aaaaaaaabbb
#Greedy mode is not displayed until the last one is found, and continuous ones are displayed

[root@rocky8 data]#  echo a |grep "a\?"
a
[root@rocky8 data]# echo b |grep "a\?"
b
#\? indicates 0 or 1 occurrences, optional

[root@rocky8 data]# echo aaa |grep "a\?"
aaa
#Many times also meet the requirements

[root@rocky8 data]# echo a |grep "a*"
a
[root@rocky8 data]# echo aba |grep "a*"
aba
[root@rocky8 data]# echo aabaa |grep "a*"
aabaa
[root@rocky8 data]# echo aabaa |grep "a\?"
aabaa
[root@rocky8 data]# echo b |grep "a\?"
b

[root@rocky8 data]# echo a |grep "aa\?"
a
[root@rocky8 data]# echo ab |grep "aa\?"
ab
[root@rocky8 data]# echo aba |grep "a\?"
aba
[root@rocky8 data]# echo aba |grep "a*"
aba
#Similar to a *
#Semantically speaking,? Means 1 or 0, and * means any one

[root@rocky8 data]# echo aba |grep -o "a\?"
a
a
[root@rocky8 data]# echo aba |grep -o "a*"
a
a
#-o filter out the matching strings one by one and print them in a single line

[root@rocky8 data]# echo a |grep -o "a\+"
a
[root@rocky8 data]# echo b |grep -o "a\+"
[root@rocky8 data]# echo b |grep  "a\+"
#\+Indicates more than 1

[root@rocky8 data]# echo aaaaaaa |grep  "a\{7\}"
aaaaaaa
[root@rocky8 data]# echo aaaaaa |grep  "a\{7\}"
[root@rocky8 data]# echo aaaaaaaaa |grep  "a\{7\}"
aaaaaaaaa
#\{m \} represents several consecutive characters. It can't be less, but it can be more

[root@rocky8 data]# echo aaaaaaaaa |grep  "a\{7,10\}"
aaaaaaaaa
[root@rocky8 data]# echo aaaaaaaaaaa |grep  "a\{7,10\}"
aaaaaaaaaaa
[root@rocky8 data]#  echo aaaaaabaaaaa |grep  "a\{7,10\}"
#\{m,n \} represents several consecutive strings. More strings can be used, but less strings can't

[root@rocky8 data]# echo aaaaaabaaaaa |grep  "a\{,10\}"
aaaaaabaaaaa
#\{, n \} indicates less than a few characters

[root@centos8 ~]# echo abc | grep ".*"
abc
#. * indicates any string

[root@rocky8 data]# echo /etc |grep -o '/etc/\?'
/etc
[root@rocky8 data]# echo /etc/ |grep -o '/etc/\?'
/etc/
#Indicates the / slash behind / etc, optional

[root@rocky8 data]# echo /etc/ |grep -o '/etc/*'
/etc/
[root@rocky8 data]#  echo /etc/ |grep -o '/etc/*'
/etc/
[root@rocky8 data]# echo /etc/ |grep -o '/etc/\?'
/etc/
#*Represents any time, \? Represents optional, and represents 1 or 0 times

example:

[root@rocky8 data]# echo /etc/ |grep "/etc/\?"
/etc/
[root@rocky8 data]# echo /etc |grep "/etc/\?"
/etc

4.1.3 position anchoring

Position anchoring can be used to locate the position where it occurs

^ #Row head anchor for the leftmost side of the pattern
$ #End of line anchor for the rightmost side of the pattern
^PATTERN$ #For pattern matching entire line
^$ #Blank line
^[[:space:]]*$ #Blank line
\< or \b #Initial anchor for the left side of the word pattern
\> or \b #Suffix anchor for the right side of the word pattern
\<PATTERN\> #Match entire word

\w #Matching word components, equivalent to [[: alnum:]]
\W #Matching non word components, equivalent to [^ [: alnum:]]

#Note: words are composed of letters, numbers and underscores

example:

[root@rocky8 data]# grep root /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@rocky8 data]# grep ^root /etc/passwd
root:x:0:0:root:/root:/bin/bash
#^Appears at the beginning of the line of the string

[root@rocky8 data]# grep 'bash$' /etc/passwd
root:x:0:0:root:/root:/bin/bash
raymond:x:1000:1000::/home/raymond:/bin/bash
boss:x:1001:1001::/home/boss:/bin/bash
#$indicates a line ending in a string

[root@rocky8 data]# grep -v '^$' /etc/init.d/functions
#^$indicates blank lines -v excluding blank lines

[root@rocky8 data]# cat -A f1.txt
raymond$
 $
^I$
$
    $
boss$
[root@rocky8 data]#  grep -v '^[[:space:]]*$' f1.txt
raymond
boss
#Filter blank lines, lines with spaces, tabs, etc

[root@rocky8 data]# echo boss | grep '\<boss'
boss
[root@rocky8 data]# echo bossceo | grep '\<boss'
bossceo
[root@rocky8 data]# echo 99bossceo | grep '\<boss'
[root@rocky8 data]# echo 99_bossceo | grep '\<boss'
[root@rocky8 data]# echo 99-bossceo | grep '\<boss'
99-bossceo
#\< or \ b denotes the beginning of a word 

[root@rocky8 data]# echo 99-bossceo | grep 'boss\>'
[root@rocky8 data]# echo 99-boss,ceo | grep 'boss\>'
99-boss,ceo
[root@rocky8 data]# echo 99-boss;ceo | grep 'boss\>'
99-boss
-bash: ceo: command not found
[root@rocky8 data]# echo 99-boss+ceo | grep 'boss\>'
99-boss+ceo
[root@rocky8 data]# echo 99-boss_ceo | grep 'boss\>'
#\>Or \ b indicates the end of a word

[root@rocky8 data]# echo 99-boss;ceo | grep 'boss\b'
99-boss
-bash: ceo: command not found
[root@rocky8 data]# echo '99-boss;ceo' | grep 'boss\b'
99-boss;ceo
[root@rocky8 data]# echo "99-boss;ceo" | grep 'boss\b'
99-boss;ceo
#You can also use \ b

Example: exclude blank lines and # beginning lines

[root@rocky8 data]# grep -v '^$' /etc/profile|grep -v '^#'
[root@rocky8 data]# grep '^[^#]' /etc/profile

4.1.4 grouping others

4.1.4.1 grouping

Grouping: () bind multiple characters together and treat them as a whole, such as: (root)+

Backward reference: the content matched by the pattern in the grouping brackets will be recorded in the internal variables by the regular expression engine. The naming methods of these variables are: \ 1, \ 2, \ 3

\1 indicates the character matched by the first left parenthesis from the left and the pattern between the matching right parentheses

Note: \ 0 represents all characters matched by the regular expression

Example:

\(string1\(string2\)\)
\1 : string1\(string2\)
\2 : string2

Note: backward references refer to the characters matched by the pattern in the preceding grouping brackets, not the pattern itself

example:

[root@rocky8 data]# echo abcabcabc |grep '\(abc\)\{3\}'
abcabcabc
[root@rocky8 data]# echo abcabc |grep '\(abc\)\{3\}'
#Indicates multiple consecutive occurrences, grouped by

[root@rocky8 data]#  echo abc12345abc |grep '\(abc\).*\1'
abc12345abc
#Discontinuous grouping, backward reference

[root@rocky8 data]# echo adc12345adc |grep '\(a.c\).*\1'
adc12345adc
[root@rocky8 data]# echo adc12345abc |grep '\(a.c\).*\1'
[root@rocky8 data]# echo adc12345adc |grep '\(a.c\).*\1'
adc12345adc
#a.c what appears in the front and what appears in the back

[root@rocky8 data]# echo adc12345afedfd123adc |grep '\(a.c\)\(123\).*\2\1'
adc12345afedfd123adc
#The first group, followed by a reference, is represented by \ 1; The second group, the back reference, is represented by \ 2, which is called the back reference

4.1.4.2 or

Or:|

Example:

a\|b #a or b
C\|cat #C or cat
\(C\|c\)at #Cat or cat

example:

[root@rocky8 data]#  echo abc | grep 'a\|b12'
abc
[root@rocky8 data]# echo b12 | grep 'a\|b12'
b12
#\|Means or, a or b12

[root@rocky8 data]# echo b12 | grep '\(a\|b\)12'
b12
[root@rocky8 data]# echo a12 | grep '\(a\|b\)12'
a12
#Group a12 or b12

Example: exclude blank lines and # beginning lines

[root@centos6 ~]#grep -v '^#' /etc/httpd/conf/httpd.conf |grep -v ^$
[root@centos6 ~]#grep -v '^#\|^$' /etc/httpd/conf/httpd.conf
[root@centos6 ~]#grep -v '^\(#\|$\)' /etc/httpd/conf/httpd.conf
[root@centos6 ~]#grep "^[^#]" /etc/httpd/conf/httpd.conf

4.1.5 regular expression exercises

1. Display the lines starting with s in / proc/meminfo file (requirement: use two methods)
2. Displays lines in the / etc/passwd file that do not end in / bin/bash
3. Displays the user rpc default shell program
4. Find the two or three digits in / etc/passwd
5. Displays lines that begin with at least one white space character and are followed by non white space characters in the / etc/grub2.cfg file of CentOS7
6. Find the line in the result of the "netstat -tan" command that ends with LISTEN followed by any number of white space characters
7. Displays all user names and UIDs with UIDs less than 1000 on CentOS7
8. Add users bash, testbash, basher, sh and nologin (whose shell is / sbin/nologin), and find the line with the same name as / etc/passwd user name and shell
9. Using df and grep, take out the utilization of each partition of the disk and sort it from large to small

4.2 extended regular expression metacharacter

4.2.1 character matching metacharacter

. Any single character
[wang] Characters in the specified range
[^wang] Characters outside the specified range
[:alnum:] Letters and numbers
[:alpha:] Represents any English upper and lower case characters, i.e A-Z, a-z
[:lower:] Lowercase letters,Example:[[:lower:]],amount to[a-z]
[:upper:] capital
[:blank:] White space characters (spaces and tabs)
[:space:] Horizontal and vertical white space characters (ratio[:blank:](wide range)
[:cntrl:] Non printable control characters (backspace, delete, alarm)...)
[:digit:] Decimal digit
[:xdigit:]Hexadecimal digit
[:graph:] Printable non white space characters
[:print:] Printable character
[:punct:] punctuation

4.2.2 times matching

* Match previous characters any time
? 0 Or 1 time
+ 1 One or more times
{n} matching n second
{m,n} at least m,at most n second

4.2.3 position anchoring

^ Line beginning
$ End of line
\<, \b Initials
\>, \b suffix

4.2.4 grouping others

() grouping
 Backward reference:\1, \2, ... be careful:\0 Represents all characters that match the regular expression
| perhaps
a|b #a or b
C|cat #C or cat
(C|c)at #Cat or cat

example:

[root@rocky8 data]# echo a12 | grep -E '(a|b)12'
a12
[root@rocky8 data]# echo a12 | egrep '(a|b)12'
a12
#grep -E or egrep supports extended regular expressions

[root@rocky8 data]#  ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.31.1.8  netmask 255.255.248.0  broadcast 172.31.7.255
        inet6 fe80::20c:29ff:fef9:6ad1  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:f9:6a:d1  txqueuelen 1000  (Ethernet)
        RX packets 11867  bytes 11817539 (11.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3942  bytes 700080 (683.6 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@rocky8 data]# ifconfig eth0 |grep netmask
        inet 172.31.1.8  netmask 255.255.248.0  broadcast 172.31.7.255
[root@rocky8 data]# ifconfig eth0 |grep netmask | tr -s " "
 inet 172.31.1.8 netmask 255.255.248.0 broadcast 172.31.7.255
[root@rocky8 data]# ifconfig eth0 |grep netmask | tr -s " " |cut -d " " -f3
172.31.1.8

##Extended regular expression method
[root@rocky8 data]# ifconfig eth0 |grep netmask | grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' |head -1
172.31.1.8

#Regular expression method
[root@rocky8 data]# ifconfig eth0 |grep netmask | grep -o '\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}' |head -1
172.31.1.8

4.2.5 extended regular expression exercise

1. Displays the UID s and default shell s of the three users root, raymond, and boss
2. Find the line in the / etc/rc.d/init.d/functions file that begins with a word (including an underscore) followed by a parenthesis
3. Use egrep to fetch its base name from / etc/rc.d/init.d/functions
4. Use egrep to get the directory name of the above path
5. Count the login times of each host IP address logged in as root in the last command
6. Extended regular expressions are used to represent 0-9, 10-99, 100-199, 200-249 and 250-255 respectively
7. Displays all IPv4 addresses in the ifconfig command result
8. De duplicate and sort each character in the string: welcome to rocky linux, and the character with many repetitions will be ranked first

Tags: Linux regex perl

Posted on Thu, 14 Oct 2021 19:50:43 -0400 by Neotropic