Chapter 5 grep and regular expression of three swordsmen in Linux text processing

3. Text processing three swordsmen

The grep command mainly filters the (regular expression) lines of text based on patterns
sed: stream editor, text editing tool
awk: implementation gawk on Linux, text report generator

3.1 grep of three swordsmen in text processing

grep: Global search REgular expression and Print out the line

Function: a text search tool to check the target text line by line according to the "pattern" specified by the user; Print matched rows

Pattern: filter conditions written by regular expression characters and text characters

Format:

grep [OPTIONS] PATTERN [FILE...]

Common options:

--color=auto Shade the matched text -m # matching#Stop after times -v Display not pattern Matched rows -i Ignore character case -n Show matching line numbers -c Count the number of matched rows -o Show only matching strings -q Silent mode, no information output -A # after, after#that 's ok -B # before, front#that 's ok -C # context, Front and back#that 's ok -e Implement logic between multiple options or relationship,For example: grep –e 'cat' -e 'dog' file -w Match entire word -E use ERE，amount to egrep -F Regular expressions are not supported, equivalent to fgrep -f file Processing according to schema file -r Recursive directories, but does not handle soft links -R Recursive directories, but handling soft links

example:

[root@rocky8 ~]# grep rocky i am raymond rocky8 rocky8 i am study rocky8 linux i am study rocky8 linux ^C #By default, wait for input. If the input contains characters, the whole line will be printed and the characters contained will be marked in red

[root@rocky8 ~]# cat /etc/passwd root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin systemd-resolve:x:193:193:systemd Resolver:/:/sbin/nologin tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin polkitd:x:998:996:User for polkitd:/:/sbin/nologin unbound:x:997:994:Unbound DNS resolver:/etc/unbound:/sbin/nologin sssd:x:996:993:User for sssd:/:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin raymond:x:1000:1000::/home/raymond:/bin/bash boss:x:1001:1001::/home/boss:/bin/bash [root@rocky8 ~]# cat /etc/passwd |grep root root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin #Receive standard input through the pipeline and find the line containing the characters

[root@rocky8 ~]# grep root /etc/passwd #grep itself supports trailing files root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin [root@rocky8 ~]# cut -d: -f1 /etc/passwd | grep root root #Filter the execution results of the previous commands with grep [root@rocky8 ~]# pstree -p |grep bash |-sshd(750)---sshd(838)---sshd(979)---bash(998)-+-grep(1389) [root@rocky8 ~]# grep root /etc/passwd /etc/group /etc/passwd:root:x:0:0:root:/root:/bin/bash /etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin /etc/group:root:x:0: #Multiple files can be processed [root@rocky8 ~]# alias grep alias grep='grep --color=auto' #The command alias defines the color [root@rocky8 ~]# \grep root /etc/passwd /etc/group /etc/passwd:root:x:0:0:root:/root:/bin/bash /etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin /etc/group:root:x:0: #The original command does not automatically add color [root@rocky8 ~]# which grep alias grep='grep --color=auto' /usr/bin/grep [root@rocky8 ~]# /usr/bin/grep root /etc/passwd /etc/group /etc/passwd:root:x:0:0:root:/root:/bin/bash /etc/passwd:operator:x:11:0:operator:/root:/sbin/nologin /etc/group:root:x:0: #You can also use the path to use the original command, but if the internal command has no path, you can't use the path. This method has some limitations [root@rocky8 ~]# grep bin /etc/passwd root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin systemd-resolve:x:193:193:systemd Resolver:/:/sbin/nologin tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin polkitd:x:998:996:User for polkitd:/:/sbin/nologin unbound:x:997:994:Unbound DNS resolver:/etc/unbound:/sbin/nologin sssd:x:996:993:User for sssd:/:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin postfix:x:89:89::/var/spool/postfix:/sbin/nologin raymond:x:1000:1000::/home/raymond:/bin/bash boss:x:1001:1001::/home/boss:/bin/bash [root@rocky8 ~]# grep -m 3 bin /etc/passwd root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin #-m you can specify to view the first few lines that contain the string [root@rocky8 ~]# grep -v nologin /etc/passwd root:x:0:0:root:/root:/bin/bash sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt raymond:x:1000:1000::/home/raymond:/bin/bash boss:x:1001:1001::/home/boss:/bin/bash #-v displays lines that do not contain strings [root@rocky8 ~]# grep -n nologin /etc/passwd 2:bin:x:1:1:bin:/bin:/sbin/nologin 3:daemon:x:2:2:daemon:/sbin:/sbin/nologin 4:adm:x:3:4:adm:/var/adm:/sbin/nologin 5:lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin 9:mail:x:8:12:mail:/var/spool/mail:/sbin/nologin 10:operator:x:11:0:operator:/root:/sbin/nologin 11:games:x:12:100:games:/usr/games:/sbin/nologin 12:ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin 13:nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin 14:dbus:x:81:81:System message bus:/:/sbin/nologin 15:systemd-coredump:x:999:997:systemd Core Dumper:/:/sbin/nologin 16:systemd-resolve:x:193:193:systemd Resolver:/:/sbin/nologin 17:tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin 18:polkitd:x:998:996:User for polkitd:/:/sbin/nologin 19:unbound:x:997:994:Unbound DNS resolver:/etc/unbound:/sbin/nologin 20:sssd:x:996:993:User for sssd:/:/sbin/nologin 21:sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin 22:postfix:x:89:89::/var/spool/postfix:/sbin/nologin #-n displays the line number containing the string [root@rocky8 ~]# grep -c nologin /etc/passwd 18 #-c displays the number of times a string line is included [root@rocky8 ~]# grep -o nologin /etc/passwd nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin nologin [root@rocky8 ~]# grep -o nologin /etc/passwd | wc -l 18 #-o display only strings containing [root@rocky8 ~]# grep -no nologin /etc/passwd 2:nologin 3:nologin 4:nologin 5:nologin 9:nologin 10:nologin 11:nologin 12:nologin 13:nologin 14:nologin 15:nologin 16:nologin 17:nologin 18:nologin 19:nologin 20:nologin 21:nologin 22:nologin [root@rocky8 ~]# grep -q root /etc/passwd #-q is not displayed if it is found or not found [root@rocky8 ~]# echo $? 0 #Found with return value 0 [root@rocky8 ~]# grep -q rooter /etc/passwd [root@rocky8 ~]# echo $? 1 #Not found. Return value is 1 #Through $? Return value to judge whether it is found or not [root@rocky8 ~]# grep root /etc/passwd &> /dev/null [root@rocky8 ~]# echo $? 0 [root@rocky8 ~]# grep rooter /etc/passwd &> /dev/null [root@rocky8 ~]# echo $? 1 #You can also judge by throwing it into the dustbin [root@rocky8 ~]# grep -n root /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 10:operator:x:11:0:operator:/root:/sbin/nologin [root@rocky8 ~]# grep -nA3 root /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 2-bin:x:1:1:bin:/bin:/sbin/nologin 3-daemon:x:2:2:daemon:/sbin:/sbin/nologin 4-adm:x:3:4:adm:/var/adm:/sbin/nologin -- 10:operator:x:11:0:operator:/root:/sbin/nologin 11-games:x:12:100:games:/usr/games:/sbin/nologin 12-ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin 13-nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin #-A displays the last few lines containing the string [root@rocky8 ~]# grep -nB3 root /etc/passwd 1:root:x:0:0:root:/root:/bin/bash -- 7-shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown 8-halt:x:7:0:halt:/sbin:/sbin/halt 9-mail:x:8:12:mail:/var/spool/mail:/sbin/nologin 10:operator:x:11:0:operator:/root:/sbin/nologin #-B displays the first few lines containing the string [root@rocky8 ~]# grep -nC3 root /etc/passwd 1:root:x:0:0:root:/root:/bin/bash 2-bin:x:1:1:bin:/bin:/sbin/nologin 3-daemon:x:2:2:daemon:/sbin:/sbin/nologin 4-adm:x:3:4:adm:/var/adm:/sbin/nologin -- 7-shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown 8-halt:x:7:0:halt:/sbin:/sbin/halt 9-mail:x:8:12:mail:/var/spool/mail:/sbin/nologin 10:operator:x:11:0:operator:/root:/sbin/nologin 11-games:x:12:100:games:/usr/games:/sbin/nologin 12-ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin 13-nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin #-C displays the first and last lines containing the string [root@rocky8 ~]# grep -e root -e bash /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin raymond:x:1000:1000::/home/raymond:/bin/bash boss:x:1001:1001::/home/boss:/bin/bash #-e can add multiple filter conditions, indicating that it contains root or bash [root@rocky8 ~]# grep root /etc/passwd | grep bash root:x:0:0:root:/root:/bin/bash #Also, filter again with a pipe [root@rocky8 ~]# echo hello |grep -w hello hello [root@rocky8 ~]# echo helloeveryone |grep -w hello [root@rocky8 ~]# echo hello:everyone |grep -w hello hello:everyone [root@rocky8 ~]# echo hello,everyone |grep -w hello hello,everyone [root@rocky8 ~]# echo hello-everyone |grep -w hello hello-everyone [root@rocky8 ~]# echo hello_everyone |grep -w hello [root@rocky8 ~]# echo hello2everyone |grep -w hello [root@rocky8 ~]# echo hello everyone |grep -w hello hello everyone #-w matches words, consecutive letters, numbers_ Underline is a word, and other character segmentation is not a word

[root@rocky8 ~]# cd /data [root@rocky8 data]# cat >f1.txt <<EOF raymond boss EOF [root@rocky8 data]# cat f1.txt raymond boss [root@rocky8 data]# cat >f2.txt <<EOF I am oldraymond bigboss I love linux I am linux student EOF [root@rocky8 data]# cat f2.txt I am oldraymond bigboss I love linux I am linux student [root@rocky8 data]# grep -f f1.txt f2.txt I am oldraymond bigboss #-f take the first file as the filter condition to see if the second file contains the string of the first file. If so, the whole line will be displayed

Example: take the same line of two files

[root@rocky8 data]# cat test1.txt a b 1 c [root@rocky8 data]# cat test2.txt b e f c 1 2 [root@rocky8 data]# grep -f test1.txt test2.txt b c 1

example:

[root@rocky8 data]# ls -R /etc [root@rocky8 data]# grep -r root /etc #-r recursively view the lines containing strings in the directory, and do not process soft links [root@rocky8 data]# grep -R root /etc #-R recursively looks at the lines of the directory that contain strings, but handles soft links

example:

[root@rocky8 data]# grep root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin [root@rocky8 data]# grep "USER" /etc/passwd [root@rocky8 data]# grep 'USER' /etc/passwd [root@rocky8 data]# grep whoami /etc/passwd

example:

[root@rocky8 data]# df | grep '^/dev/sd' |tr -s ' ' %|cut -d% -f5|sort -n|tail -1 19

example:

[root@rocky8 data]# grep "^ESTAB" ss2.log |tr -s ' ' : |cut -d: -f6|sort |uniq -c|sort -nr|head -n3 12 223.88.255.148 10 183.202.63.36 9 117.152.155.119

example:

[root@rocky8 data]# grep -v "^#" /etc/profile | grep -v '^$' pathmunge () { case ":$:" in *:"$1":*) ;; *) if [ "$2" = "after" ] ; then PATH=$PATH:$1 else PATH=$1:$PATH fi esac } if [ -x /usr/bin/id ]; then if [ -z "$EUID" ]; then # ksh workaround EUID=`/usr/bin/id -u` UID=`/usr/bin/id -ru` fi USER="`/usr/bin/id -un`" LOGNAME=$USER MAIL="/var/spool/mail/$USER" fi if [ "$EUID" = "0" ]; then pathmunge /usr/sbin pathmunge /usr/local/sbin else pathmunge /usr/local/sbin after pathmunge /usr/sbin after fi HOSTNAME=`/usr/bin/hostname 2>/dev/null` HISTSIZE=1000 if [ "$HISTCONTROL" = "ignorespace" ] ; then export HISTCONTROL=ignoreboth else export HISTCONTROL=ignoredups fi export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL if [ $UID -gt 199 ] && [ "`/usr/bin/id -gn`" = "`/usr/bin/id -un`" ]; then umask 002 else umask 022 fi for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do if [ -r "$i" ]; then if [ "${-#*i}" != "$-" ]; then . "$i" else . "$i" >/dev/null fi fi done unset i unset -f pathmunge if [ -n "$" ] ; then if [ -f /etc/bashrc ] ; then # Bash login shells run only /etc/profile # Bash non-login shells run only /etc/bashrc # Check for double sourcing is done in /etc/bashrc. . /etc/bashrc fi fi [root@rocky8 data]# grep -v "^#\|^$" /etc/profile [root@rocky8 data]# grep "^[^#]" /etc/profile [root@rocky8 data]# grep -v "^$#\|$$" /etc/profile [root@rocky8 data]# grep -Ev "^(#|$)" /etc/profile [root@rocky8 data]# egrep -v "^(#|$)" /etc/profile [root@centos6 ~]# egrep -v '^(#|$)' /etc/httpd/conf/httpd.conf

example:

[root@rocky8 data]# grep -o 'r..t' /etc/passwd root root root root r/ft

example:

[root@rocky8 data]# ifconfig | grep -E '[0-9].[0-9].[0-9].[0-9]' inet 172.31.1.8 netmask 255.255.248.0 broadcast 172.31.7.255 RX packets 9744 bytes 11614557 (11.0 MiB) inet 127.0.0.1 netmask 255.0.0.0 [root@rocky8 data]# ifconfig | grep -E '([0-9].)[0-9]' inet 172.31.1.8 netmask 255.255.248.0 broadcast 172.31.7.255 RX packets 9752 bytes 11615406 (11.0 MiB) inet 127.0.0.1 netmask 255.0.0.0 [root@rocky8 data]# ifconfig eth0 | grep -Eo '([0-9]\.)[0-9]'|head -1 172.31.1.8 [root@rocky8 data]# cat > regex.txt ([0-9]\.)[0-9] ^C [root@rocky8 data]# ifconfig | grep -oEf regex.txt 172.31.1.8 255.255.248.0 172.31.7.255 127.0.0.1 255.0.0.0

example:

[root@rocky8 data]# grep -E 'root|bash' /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin raymond:x:1000:1000::/home/raymond:/bin/bash boss:x:1001:1001::/home/boss:/bin/bash

example:

[root@rocky8 data]# grep -w root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin [root@rocky8 data]# grep '\<root\>' /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin

example:

[root@rocky8 data]# grep "^$.*$\>.*\<\1$" /etc/passwd sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt [root@rocky8 data]# grep -E "^(.*)\>.*\<\1$" /etc/passwd sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt [root@rocky8 data]# egrep "^(.*)\>.*\<\1$" /etc/passwd sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt

Example: the interview question, calculate the sum of everyone's ages

[root@rocky8 data]# cat > nianling.txt xiaoming=20 xiaohong=18 xiaoqiang=22 ^C [root@rocky8 data]# cut -d"=" -f2 nianling.txt|tr '\n' + | grep -Eo ".*[0-9]"|bc 60 [root@rocky8 data]# grep -Eo "[0-9]+" nianling.txt | tr '\n' + | grep -Eo ".*[0-9]"|bc 60

4. Regular expression

REGEXP: Regular Expressions is a pattern written by a class of special characters and text characters. Some characters (metacharacters) do not represent the literal meaning of the characters, but represent the function of control or wildcard, which is similar to the enhanced wildcard function, but different from the wildcard, the wildcard function is used to process file names, and the regular expression is used to process characters in text content

Regular expressions are widely supported by many programs and development languages: vim, less,grep,sed,awk, nginx,mysql, etc

Regular expressions fall into two categories:

Basic regular expression: BRE
Extended regular expressions: ERE

Regular expression engine:

Use different algorithms to check the software modules dealing with regular expressions, such as PCRE (Perl Compatible Regular Expressions)

Metacharacter classification of regular expressions: character matching, matching times, position anchoring and grouping

Help: man 7 regex

4.1 basic regular expression metacharacters

4.1.1 character matching

. Match any single character(except\n)，It can be a Chinese character or the characters of other countries [] Matches any single character within the specified range, for example:[wang] [0-9] [a-z] [a-zA-Z] [^] Matches any single character outside the specified range,Example:[^wang] [:alnum:] Letters and numbers [:alpha:] Represents any English upper and lower case characters, i.e A-Z, a-z [:lower:] Lowercase letters,Example:[[:lower:]],amount to[a-z] [:upper:] capital [:blank:] White space characters (spaces and tabs) [:space:] Including spaces, tabs(Horizontal and vertical),Various types of blanks such as line feed, carriage return, etc,than[:blank:]It covers a wide range [:cntrl:] Non printable control characters (backspace, delete, alarm)...) [:digit:] Decimal digit [:xdigit:]Hexadecimal digit [:graph:] Printable non white space characters [:print:] Printable character [:punct:] punctuation \s #Matches any whitespace characters, including spaces, tabs, page breaks, etc. equivalent to [\ f\r\t\v]. Note that Unicode regular expressions match full width whitespace characters \S #Matches any non whitespace character. Equivalent to [^ \ f\r\t\v] \w #Match a letter, number, underline, Chinese character, character of other countries, equivalent to [[: alnum:] character] \W #Match a character other than letters, numbers, underscores, Chinese characters and other national characters, equivalent to [^ [: alnum:] characters]

example:

[root@rocky8 data]# ls aa.txt alpha.log c emp.txt f1.txt.orig fa.txt f.txt pass.txt regex.txt ss2.log test2.txt title.txt access_log a.txt c.txt f1.txt f2.txt f.patch nianling.txt passwd seq.log test1.txt title1.txt [root@rocky8 data]# ls |grep 'f..txt' f1.txt f1.txt.orig f2.txt fa.txt [root@rocky8 data]# touch faatxt [root@rocky8 data]# touch fbbtxt [root@rocky8 data]# ls |grep 'f.\.txt' f1.txt f1.txt.orig f2.txt fa.txt #. represents a character. To represent f1.txt, it is necessary to escape the following. It is not possible to use f..txt, but f.\.txt [root@rocky8 data]# ls |grep f.\.txt f1.txt f1.txt.orig f2.txt faatxt fa.txt fbbtxt #Regular expressions should be enclosed in '' single quotation marks, otherwise the use of regular expressions will not take effect #grep treats the result from the previous pipeline as a string [root@rocky8 data]# ls f?.txt f1.txt f2.txt fa.txt #You can also use wildcards

example:

[root@rocky8 data]# ls /etc/ | grep 'rc[.0-6]' rc0.d rc1.d rc2.d rc3.d rc4.d rc5.d rc6.d rc.d rc.local [root@rocky8 data]# ls /etc/ | grep 'rc[.0-6].' rc0.d rc1.d rc2.d rc3.d rc4.d rc5.d rc6.d rc.d rc.local [root@rocky8 data]# ls /etc/ | grep 'rc[.0-6]\.' rc0.d rc1.d rc2.d rc3.d rc4.d rc5.d rc6.d

4.1.2 matching times

Used after the character to specify the number of times. It is used to specify the number of times the preceding character will appear

* Match the preceding characters any time, including 0 times. Greedy mode: match as long as possible, such as: a* express a Any number of times .* Any character of any length \? Matches the character before it 0 or 1 times,Namely:not essential \+ Matches the character preceding it at least once,Namely:There must be and >=1 second \ Match previous characters n second,For example: a\ \ Match previous characters at least m Times, at most n second \{,n\} Match previous characters at most n second,<=n \ Match previous characters at least n second

example:

[root@rocky8 data]# ls f* f1.txt f1.txt.orig f2.txt faatxt fa.txt fbbtxt f.patch f.txt #In the wildcard, * asterisk indicates any string [root@rocky8 data]# echo aa |grep "a*" aa [root@rocky8 data]# echo aaa |grep "a*" aaa [root@rocky8 data]# echo aaaaaaa |grep "a*" aaaaaaa #In a regular expression, * asterisk indicates that the previous string appears any time [root@rocky8 data]# echo b |grep "a*" b #0 occurrences are also consistent [root@rocky8 data]# echo b |grep "aa*" [root@rocky8 data]# echo a |grep "aa*" a #Indicates that the string appears once or any time, preceded by another character [root@rocky8 data]# echo aaaaaaaabbb |grep "aa*" aaaaaaaabbb #Greedy mode is not displayed until the last one is found, and continuous ones are displayed

[root@rocky8 data]# echo a |grep "a\?" a [root@rocky8 data]# echo b |grep "a\?" b #\? indicates 0 or 1 occurrences, optional [root@rocky8 data]# echo aaa |grep "a\?" aaa #Many times also meet the requirements [root@rocky8 data]# echo a |grep "a*" a [root@rocky8 data]# echo aba |grep "a*" aba [root@rocky8 data]# echo aabaa |grep "a*" aabaa [root@rocky8 data]# echo aabaa |grep "a\?" aabaa [root@rocky8 data]# echo b |grep "a\?" b

[root@rocky8 data]# echo a |grep "aa\?" a [root@rocky8 data]# echo ab |grep "aa\?" ab [root@rocky8 data]# echo aba |grep "a\?" aba [root@rocky8 data]# echo aba |grep "a*" aba #Similar to a * #Semantically speaking,? Means 1 or 0, and * means any one

[root@rocky8 data]# echo aba |grep -o "a\?" a a [root@rocky8 data]# echo aba |grep -o "a*" a a #-o filter out the matching strings one by one and print them in a single line

[root@rocky8 data]# echo a |grep -o "a\+" a [root@rocky8 data]# echo b |grep -o "a\+" [root@rocky8 data]# echo b |grep "a\+" #\+Indicates more than 1

[root@rocky8 data]# echo aaaaaaa |grep "a\" aaaaaaa [root@rocky8 data]# echo aaaaaa |grep "a\" [root@rocky8 data]# echo aaaaaaaaa |grep "a\" aaaaaaaaa #\ represents several consecutive characters. It can't be less, but it can be more

[root@rocky8 data]# echo aaaaaaaaa |grep "a\" aaaaaaaaa [root@rocky8 data]# echo aaaaaaaaaaa |grep "a\" aaaaaaaaaaa [root@rocky8 data]# echo aaaaaabaaaaa |grep "a\" #\ represents several consecutive strings. More strings can be used, but less strings can't

[root@rocky8 data]# echo aaaaaabaaaaa |grep "a\{,10\}" aaaaaabaaaaa #\{, n \} indicates less than a few characters

[root@centos8 ~]# echo abc | grep ".*" abc #. * indicates any string [root@rocky8 data]# echo /etc |grep -o '/etc/\?' /etc [root@rocky8 data]# echo /etc/ |grep -o '/etc/\?' /etc/ #Indicates the / slash behind / etc, optional

[root@rocky8 data]# echo /etc/ |grep -o '/etc/*' /etc/ [root@rocky8 data]# echo /etc/ |grep -o '/etc/*' /etc/ [root@rocky8 data]# echo /etc/ |grep -o '/etc/\?' /etc/ #*Represents any time, \? Represents optional, and represents 1 or 0 times

example:

[root@rocky8 data]# echo /etc/ |grep "/etc/\?" /etc/ [root@rocky8 data]# echo /etc |grep "/etc/\?" /etc

4.1.3 position anchoring

Position anchoring can be used to locate the position where it occurs

^ #Row head anchor for the leftmost side of the pattern $ #End of line anchor for the rightmost side of the pattern ^PATTERN$ #For pattern matching entire line ^$ #Blank line ^[[:space:]]*$ #Blank line \< or \b #Initial anchor for the left side of the word pattern \> or \b #Suffix anchor for the right side of the word pattern \<PATTERN\> #Match entire word \w #Matching word components, equivalent to [[: alnum:]] \W #Matching non word components, equivalent to [^ [: alnum:]] #Note: words are composed of letters, numbers and underscores

example:

[root@rocky8 data]# grep root /etc/passwd root:x:0:0:root:/root:/bin/bash operator:x:11:0:operator:/root:/sbin/nologin [root@rocky8 data]# grep ^root /etc/passwd root:x:0:0:root:/root:/bin/bash #^Appears at the beginning of the line of the string

[root@rocky8 data]# grep 'bash$' /etc/passwd root:x:0:0:root:/root:/bin/bash raymond:x:1000:1000::/home/raymond:/bin/bash boss:x:1001:1001::/home/boss:/bin/bash #$indicates a line ending in a string

[root@rocky8 data]# grep -v '^$' /etc/init.d/functions #^$indicates blank lines -v excluding blank lines [root@rocky8 data]# cat -A f1.txt raymond$ $ ^I$ $ $ boss$ [root@rocky8 data]# grep -v '^[[:space:]]*$' f1.txt raymond boss #Filter blank lines, lines with spaces, tabs, etc [root@rocky8 data]# echo boss | grep '\<boss' boss [root@rocky8 data]# echo bossceo | grep '\<boss' bossceo [root@rocky8 data]# echo 99bossceo | grep '\<boss' [root@rocky8 data]# echo 99_bossceo | grep '\<boss' [root@rocky8 data]# echo 99-bossceo | grep '\<boss' 99-bossceo #\< or \ b denotes the beginning of a word

[root@rocky8 data]# echo 99-bossceo | grep 'boss\>' [root@rocky8 data]# echo 99-boss,ceo | grep 'boss\>' 99-boss,ceo [root@rocky8 data]# echo 99-boss;ceo | grep 'boss\>' 99-boss -bash: ceo: command not found [root@rocky8 data]# echo 99-boss+ceo | grep 'boss\>' 99-boss+ceo [root@rocky8 data]# echo 99-boss_ceo | grep 'boss\>' #\>Or \ b indicates the end of a word [root@rocky8 data]# echo 99-boss;ceo | grep 'boss\b' 99-boss -bash: ceo: command not found [root@rocky8 data]# echo '99-boss;ceo' | grep 'boss\b' 99-boss;ceo [root@rocky8 data]# echo "99-boss;ceo" | grep 'boss\b' 99-boss;ceo #You can also use \ b

Example: exclude blank lines and # beginning lines

[root@rocky8 data]# grep -v '^$' /etc/profile|grep -v '^#' [root@rocky8 data]# grep '^[^#]' /etc/profile

4.1.4 grouping others

4.1.4.1 grouping

Grouping: () bind multiple characters together and treat them as a whole, such as: (root)+

Backward reference: the content matched by the pattern in the grouping brackets will be recorded in the internal variables by the regular expression engine. The naming methods of these variables are: \ 1, \ 2, \ 3

\1 indicates the character matched by the first left parenthesis from the left and the pattern between the matching right parentheses

Note: \ 0 represents all characters matched by the regular expression

Example:

$string1\(string2$\) \1 : string1$string2$ \2 : string2

Note: backward references refer to the characters matched by the pattern in the preceding grouping brackets, not the pattern itself

example:

[root@rocky8 data]# echo abcabcabc |grep '$abc$\' abcabcabc [root@rocky8 data]# echo abcabc |grep '$abc$\' #Indicates multiple consecutive occurrences, grouped by

[root@rocky8 data]# echo abc12345abc |grep '$abc$.*\1' abc12345abc #Discontinuous grouping, backward reference

[root@rocky8 data]# echo adc12345adc |grep '$a.c$.*\1' adc12345adc [root@rocky8 data]# echo adc12345abc |grep '$a.c$.*\1' [root@rocky8 data]# echo adc12345adc |grep '$a.c$.*\1' adc12345adc #a.c what appears in the front and what appears in the back

[root@rocky8 data]# echo adc12345afedfd123adc |grep '$a.c$$123$.*\2\1' adc12345afedfd123adc #The first group, followed by a reference, is represented by \ 1; The second group, the back reference, is represented by \ 2, which is called the back reference

4.1.4.2 or

Or:|

Example:

a\|b #a or b C\|cat #C or cat $C\|c$at #Cat or cat

example:

[root@rocky8 data]# echo abc | grep 'a\|b12' abc [root@rocky8 data]# echo b12 | grep 'a\|b12' b12 #\|Means or, a or b12

[root@rocky8 data]# echo b12 | grep '$a\|b$12' b12 [root@rocky8 data]# echo a12 | grep '$a\|b$12' a12 #Group a12 or b12

Example: exclude blank lines and # beginning lines

[root@centos6 ~]#grep -v '^#' /etc/httpd/conf/httpd.conf |grep -v ^$ [root@centos6 ~]#grep -v '^#\|^$' /etc/httpd/conf/httpd.conf [root@centos6 ~]#grep -v '^$#\|$$' /etc/httpd/conf/httpd.conf [root@centos6 ~]#grep "^[^#]" /etc/httpd/conf/httpd.conf

4.1.5 regular expression exercises

1. Display the lines starting with s in / proc/meminfo file (requirement: use two methods)
2. Displays lines in the / etc/passwd file that do not end in / bin/bash
3. Displays the user rpc default shell program
4. Find the two or three digits in / etc/passwd
5. Displays lines that begin with at least one white space character and are followed by non white space characters in the / etc/grub2.cfg file of CentOS7
6. Find the line in the result of the "netstat -tan" command that ends with LISTEN followed by any number of white space characters
7. Displays all user names and UIDs with UIDs less than 1000 on CentOS7
8. Add users bash, testbash, basher, sh and nologin (whose shell is / sbin/nologin), and find the line with the same name as / etc/passwd user name and shell
9. Using df and grep, take out the utilization of each partition of the disk and sort it from large to small

4.2 extended regular expression metacharacter

4.2.1 character matching metacharacter

. Any single character [wang] Characters in the specified range [^wang] Characters outside the specified range [:alnum:] Letters and numbers [:alpha:] Represents any English upper and lower case characters, i.e A-Z, a-z [:lower:] Lowercase letters,Example:[[:lower:]],amount to[a-z] [:upper:] capital [:blank:] White space characters (spaces and tabs) [:space:] Horizontal and vertical white space characters (ratio[:blank:](wide range) [:cntrl:] Non printable control characters (backspace, delete, alarm)...) [:digit:] Decimal digit [:xdigit:]Hexadecimal digit [:graph:] Printable non white space characters [:print:] Printable character [:punct:] punctuation

4.2.2 times matching

* Match previous characters any time ? 0 Or 1 time + 1 One or more times matching n second at least m，at most n second

4.2.3 position anchoring

^ Line beginning $ End of line \<, \b Initials \>, \b suffix

4.2.4 grouping others

() grouping Backward reference:\1, \2, ... be careful:\0 Represents all characters that match the regular expression | perhaps a|b #a or b C|cat #C or cat (C|c)at #Cat or cat

example:

[root@rocky8 data]# echo a12 | grep -E '(a|b)12' a12 [root@rocky8 data]# echo a12 | egrep '(a|b)12' a12 #grep -E or egrep supports extended regular expressions

[root@rocky8 data]# ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.31.1.8 netmask 255.255.248.0 broadcast 172.31.7.255 inet6 fe80::20c:29ff:fef9:6ad1 prefixlen 64 scopeid 0x20<link> ether 00:0c:29:f9:6a:d1 txqueuelen 1000 (Ethernet) RX packets 11867 bytes 11817539 (11.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3942 bytes 700080 (683.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@rocky8 data]# ifconfig eth0 |grep netmask inet 172.31.1.8 netmask 255.255.248.0 broadcast 172.31.7.255 [root@rocky8 data]# ifconfig eth0 |grep netmask | tr -s " " inet 172.31.1.8 netmask 255.255.248.0 broadcast 172.31.7.255 [root@rocky8 data]# ifconfig eth0 |grep netmask | tr -s " " |cut -d " " -f3 172.31.1.8 ##Extended regular expression method [root@rocky8 data]# ifconfig eth0 |grep netmask | grep -Eo '([0-9]\.)[0-9]' |head -1 172.31.1.8 #Regular expression method [root@rocky8 data]# ifconfig eth0 |grep netmask | grep -o '$[0-9]\\.$\[0-9]\' |head -1 172.31.1.8

4.2.5 extended regular expression exercise

1. Displays the UID s and default shell s of the three users root, raymond, and boss
2. Find the line in the / etc/rc.d/init.d/functions file that begins with a word (including an underscore) followed by a parenthesis
3. Use egrep to fetch its base name from / etc/rc.d/init.d/functions
4. Use egrep to get the directory name of the above path
5. Count the login times of each host IP address logged in as root in the last command
6. Extended regular expressions are used to represent 0-9, 10-99, 100-199, 200-249 and 250-255 respectively
7. Displays all IPv4 addresses in the ifconfig command result
8. De duplicate and sort each character in the string: welcome to rocky linux, and the character with many repetitions will be ranked first

Chapter 5 grep and regular expression of three swordsmen in Linux text processing

3.1 grep of three swordsmen in text processing

4.1 basic regular expression metacharacters

4.1.1 character matching

4.1.2 matching times

4.1.3 position anchoring

4.1.4 grouping others

4.1.5 regular expression exercises

4.2 extended regular expression metacharacter

4.2.1 character matching metacharacter

4.2.2 times matching

4.2.3 position anchoring

4.2.4 grouping others

4.2.5 extended regular expression exercise

14 October 2021, 19:50 | Views: 7182

Add new comment

0 comments