Regular expression and programming The Three Swordsmen (grep, sed, awk) command is detailed

Blog Outline:
1. Regular expressions
(1) Definition of regular expressions
(2) Use of regular expressions
1. Basic Regular Expressions
(1) grep command tool
2. Extending regular expressions
2. Text Editing Processor
1.grep Command Tool
2.sed Command Tool
3.awk Command Tool

1. Regular expressions

(1) Definition of regular expressions

Regular expressions are also called regular expressions and regular expressions.It is often abbreviated as regex, regexp, or RE in code.Regular expressions are described using a single string that matches a series of strings that match a certain syntax rule.Simply put, regular expressions are a way of matching strings to quickly find, delete, and replace a particular string with special symbols.

Regular expressions are literal patterns consisting of common characters and metacharacters.This pattern is used to describe one or more strings to match when searching for text.A regular expression acts as a template to match a character pattern to the string being searched for.Common characters include uppercase and lowercase letters, numbers, punctuation symbols, and some other symbols. Metacharacters refer to special characters that have special meaning in regular expressions and can be used to specify the mode in which the leading characters (that is, the characters that precede the metacharacters) appear in the target object.

Regular expressions are commonly used in scripting and text editors.Regular expressions are supported by many text processors and programming languages.For example, text processors (grep, egrep, sed, awk) commonly used by LInux systems, regular expressions have a powerful text matching function, which enables them to process text quickly and efficiently in the ocean of text.

(2) Use of regular expressions

Regular expressions are very important for system administrators. A lot of information is generated during the system operation, some of which are very important, some are just warning information.As a system administrator, if you view so much information and data directly, you can't quickly locate very important information.Important information such as "User account login failure", "Service startup failure" and so on.This allows you to quickly extract problematic information through regular expressions, which makes the operation and maintenance easier and more convenient.

The influence of system language on regular expressions is very big!
The output of zh_TW.big5 and C are as follows:
When LANG=C: 0 1 2 3 4... A B C D... Z a B C D... Z
LANG=zh_TW: 0 1 2 3 4... A A B B C C D D... Z Z

In order to avoid the interception of English and numbers caused by such encoding, there are some special symbols that we need to know!Figure:

At present, many software also supports regular expressions.In the Internet, spam, e-mail, etc. can cause network congestion. If these problems are eliminated in advance on the server side, the client will reduce unnecessary bandwidth consumption.

As a Linux system administrator, mastering regular expressions is one of the prerequisites.

1. Basic Regular Expressions

The string representation of regular expressions is divided into basic and extended regular expressions according to their level of rigor.Basic regular expressions are the most basic part of common regular expressions.grep and sed support regular expressions in common file processing tools on Linux systems, while egrep and awk support extended regular expressions.To master the use of basic regular expressions, you must first understand the meaning of the metacharacters contained in the basic regular expressions.

(1) grep command tool

1) Examples of basic regular expressions:

[root@localhost ~]# grep -n 'the' test.txt 
//Find the row containing the
[root@localhost ~]# grep -vn 'the' test.txt
//Find rows that do not contain the
[root@localhost ~]# grep -in 'the' test.txt
//Find rows that contain the and are case insensitive
[root@localhost ~]# grep -n 'sh[io]rt' test.txt 
//Find starts with sh and ends with rt, with i or o in the middle
[root@localhost ~]# grep -n '[^w]oo' test.txt 
//Query oo preceded by a string other than w
[root@localhost ~]# grep -n '[^a-z]oo' test.txt 
//A string that is not preceded by a lowercase letter in the query oo
[root@localhost ~]# grep -n '^the' test.txt 
//A string that starts the query with the (^ at the beginning)
[root@localhost ~]# grep -n '^[^a-zA-Z]' test.txt 
//Query strings that do not start with a letter ([^] means the opposite)
[root@localhost ~]# grep -n '\.$' a.txt 
//Query string ending with'..
//$means the end of the line, because'. 'is a special metacharacter, it needs to be converted to a normal character using the'\' escape character
[root@localhost ~]# grep -n 'w..d' test.txt 
//Query for rows with two characters between w and d ('. 'matches any one character)
[root@localhost ~]# grep -n 'ooo*' test.txt 
//Find a string that contains at least two o,'*'means to repeat zero or more previous single characters
[root@localhost ~]# grep -n 'woo*d' test.txt 
//Query begins with w and ends with d with at least one row containing o
[root@localhost ~]# grep -n 'w.*d' test.txt 
//The query ends with d starting with w, and the middle characters can have indispensable lines ("." means any)
[root@localhost ~]# grep -n 'o\{2\}' test.txt 
//{n} matches the determined n times.Query contains two o rows ('{}'is a special character that needs to be escaped with'\')
[root@localhost ~]# grep -n 'wo\{2,5\}' test.txt 
//The query ends with d starting with w and contains rows of 2 to 5 o ({n,m} matches at least N and at most M times)
[root@localhost ~]# grep -n 'wo\{2,\}' test.txt 
//Queries begin with w and end with d, with more than two rows of o in between ({n,} matches at least n times)

2) Summary of common metacharacters in basic regular expressions, as shown in Fig.

2. Extending regular expressions

Usually it is sufficient to use the underlying regular expression, but to simplify the entire instruction, you need to use a wider range of extended regular expressions.

In common text processing tools on Linux system, egrep and awk support extended regular expressions, and the egrep command is similar to the grep command in usage.

1) Summary of common metacharacters in extended regular expressions

2. Text Editing Processor

1.grep Command Tool

As already mentioned in the basic regular expressions, this is not covered in detail here!

2.sed Command Tool

Sed is a powerful and simple text parsing and conversion tool that reads text, edits its contents according to specified conditions, and outputs only certain lines of processing for all lines of life. sed can perform fairly complex text processing operations without interaction.It is widely used in shell scripts to accomplish a variety of automated processing tasks.

sed's workflow mainly includes:

  1. Read: sed reads a line from the input stream and cannot be stored in a temporary buffer;
  2. Execution: By default, all sed commands are executed sequentially in the mode space, unless the address of the line is specified, the SED command will execute sequentially on all the lines.
  3. Display: After sending the modified content to the output stream, and then sending the data, the mode space will be emptied.
    Note: The above process repeats until all file contents are processed.

1) The syntax of sed command and related parameters:

Common sed command options Common parameters, such as the following:

Common operating parameters include:

2) Sample sed command usage

Note that the following actions do not change the contents of the file itself and must be modified with the'-i'option if necessary

(1) Use sed command to filter eligibility

[root@localhost ~]# sed -n 'p' test.txt 
//Output everything, equivalent to "cat test.txt"
[root@localhost ~]# sed -n '3p' test.txt 
//Output the third line
[root@localhost ~]# sed -n '3,5p' test.txt 
//Output 3~5 rows
[root@localhost ~]# sed -n 'p;n' test.txt
//Output all odd rows, n means read in the next row
[root@localhost ~]# sed -n 'n;p' test.txt 
//Output all even rows, n means read in the next row
[root@localhost ~]# sed -n '1,5{p;n}' test.txt 
//Output odd lines between lines 1 and 5 (lines 1, 3, 5)
[root@localhost ~]# sed -n '10,${n;p}' test.txt
//Output even lines (including empty lines) between line 10 and the end of the file

Case study of combining sed command with regular expression

The sed command combines regular expressions with slightly different formats, which are surrounded by'/'.

[root@localhost ~]# sed -n '/the/p' test.txt
//Output line containing "the"
[root@localhost ~]# sed -n '4,/the/p' test.txt
//Output from line 4 to the first line containing "the"
[root@localhost ~]# sed -n '/the/=' test.txt
//The output contains the line number where the line containing the''is located (the equal sign (=) is used to output the line number)
[root@localhost ~]# sed -n '/^PI/p' test.txt
//Output lines starting with PI
[root@localhost ~]# sed -n '/\<wood\>/p' test.txt 
//Output lines containing the word wood, \<, \>representing word boundaries

(2) Delete qualified text

The nl command calculates the number of lines in a file

[root@localhost ~]# nl test.txt | sed '3d'
//Delete line 3
[root@localhost ~]# nl test.txt | sed '3,5d'
//Delete lines 3-5
[root@localhost ~]# nl test.txt | sed '/cross/d'
//Delete rows containing cross es, original 8th row deleted
[root@localhost ~]# nl test.txt | sed '/cross/! d'
//Delete rows that do not contain cross es
[root@localhost ~]# sed '/\.$/d' test.txt 
//Delete rows ending with'. '
[root@localhost ~]# sed '/^$/d' test.txt 
//Delete all empty lines
[root@localhost ~]# sed -e '/^$/{n;/^$/d}' test.txt
//Delete empty rows, leaving one empty row in succession

(3) Replace qualified text

The options you need to use for substitution with the sed command are s (string substitution), c (whole line/block substitution), y (character conversion), and so on.

[root@localhost ~]# sed 's/the/THE/' test.txt
//Replace the first of each line with The
[root@localhost ~]# sed 's/l/L/2' test.txt
//Replace the third "l" in each line with "L"
[root@localhost ~]# sed 's/the/THE/g' test.txt 
//Replace all "the" in the file with "THE"
[root@localhost ~]# sed 's/o//g' test.txt 
//Delete all "o" from the file
[root@localhost ~]# sed 's/^/#/' test.txt 
//Insert'#'at the beginning of each line
[root@localhost ~]# sed '/the/s/^/#/' test.txt 
//Insert'#'at the beginning of each line containing'the'
[root@localhost ~]# sed 's/$/EOF/' test.txt 
//Insert the string "EOF" at the end of each line
[root@localhost ~]# sed '3,5s/the/THE/g' test.txt 
//Replace all "the" in lines 3-5 with "THE"
[root@localhost ~]# sed '/the/s/o/O/g' test.txt 
//Replace o in all rows containing "the" with "O"

The above command "sed-i" directly modifies the contents of the file and takes effect immediately!

[root@localhost ~]# sed -i '1c 1111' a.txt 
//Replace the first line with "1111"
[root@localhost ~]# sed -i '1a 1111' a.txt 
//Insert a line after the first line with the content "1111"
[root@localhost ~]# sed -i '1i 2222' a.txt
//Insert a line before the first line with the content "2222"
[root@localhost ~]# sed -i '1d' a.txt
//Delete the first line
[root@localhost ~]# sed -n '1p' a.txt
//Print out the contents of the first line
[root@localhost ~]# sed -i '1s/2222/3333/g' a.txt 
//Replace the first line of text with 2222 and 3333

(4) Migrate qualified text

The options you need to migrate text using the sed command are:

  • G, G overrides/appends data from the clipboard to the specified row;
  • w Save as a file;
  • r Reads the specified file;
  • a Appends the specified content.
[root@localhost ~]# sed '/the/{H;d};$G' test.txt 
//Migrate rows containing "the" to the end of the file, ";" for multiple operations
[root@localhost ~]# sed '1,5{H;d};17G' test.txt 
//Move the contents of lines 1-5 after line 17
[root@localhost ~]# sed '/the/w out.file' test.txt 
//Save the line containing "the" as a file out.file
[root@localhost ~]# sed '/the/r /etc/hostname' test.txt 
//After adding the contents of the file/etc/hostname to each line containing "the"
[root@localhost ~]# sed '3aNEW' test.txt 
//Insert a new line after line 3 with the content "NEW"
[root@localhost ~]# sed '/the/aNEW' test.txt 
//Insert a new line after each line containing "the" with the content "NEW"
[root@localhost ~]# sed '3aNEW1\nNEW2' test.txt
//Multiple lines after line 3, with "\n" in the middle indicating a line break

(5) Using scripts to edit files

Using the sed script, edit instructions are stored in a file (one tag instruction per line) and invoked with the'-f'option.

[root@localhost ~]# sed '1,5{H;d};17G' test.txt
//Move lines 1-5 after line 17

The above operations are converted to script files:

[root@localhost ~]# vim 1.list
1,5H
1,5d
17G
[root@localhost ~]# sed -f 1.list test.txt

(6) sed direct operation file example

Write a script to adjust the vsftpd service configuration: disallow anonymous users but allow local users (and write) to log on.

[root@localhost ~]# vim local_only_ftp.sh
#!/bin/bash
S="/usr/share/doc/vsftpd-3.0.2/EXAMPLE/INSERNET_SITE/vsftpd.conf"
C="/etc/vsftpd/vsftpd.conf"
#Specify Sample File Path, Profile Path
[ ! -e "$C.bak" ] && cp $C $C.bak
#Back up the original configuration file, check if (configuration file.bak) exists, or use the cp command to copy if it does not exist
sed -e '/^anonymous_enable/s/YES/NO/g' $S > $C
sed -i -e '/^local_enable/s/NO/YES/g' -e '/^write_enable/s/NO/YES/g' $C
grep "listen" $C || sed -i '$alisten=YES' $A
#Adjust based on sample configuration to overwrite existing files
systemctl restart vsftpd
systemctl enable vsftpd
#Restart the ftp service and set it to boot-up and self-start

3.awk Command Tool

In Linux/UNIX systems, awk is a powerful editing tool that reads input text line by line, searches according to a specified matching pattern, formats and outputs qualified content or filters it. It can achieve quite complex text operations without interaction, and is widely used in Shell scripts to complete various automated configuration tasks.

1) Overview of awk commands

The result of awk execution can be printed and displayed by the print function.Logical operators'&'and'||' can be used during the use of awk commands;
Simple mathematical operations such as plus, -, *, /,%, ^ can also be performed to represent addition, subtraction, multiplication, division, redundancy, and multiplication, respectively.

Awk reads information from an input file or standard input, and like sed, information is read line by line.The difference is that the awk command treats a line in a text file as a record and a part (column) of the line as a field of the record.To manipulate these different fields (columns), awk borrows a location variable-like method from the shell, using $1, $2...The order of $9 represents different columns and $0 represents the entire row.Different fields can be separated from different fields in a specified way, and the awk default separator is a space.The awk command allows you to specify a separator in the form of a'-F separator'.

The awk command processes the /etc/passwd file as shown in the following figure:

awk contains several special built-in variables, such as:

2) Example awk command usage

(1) Output text by line

[root@localhost ~]# awk '{print}' test.txt 
//Output everything, equivalent to "cat test.txt"
[root@localhost ~]# awk '{print $0}' test.txt
//Output everything, equivalent to "cat test.txt"
[root@localhost ~]# awk 'NR==1,NR==3{print}' test.txt 
//Output lines 1~3
[root@localhost ~]# awk '(NR>=1) && (NR<=3) {print}' test.txt 
//Output lines 1~3
[root@localhost ~]# awk 'NR==1 || NR==3{print}' test.txt 
//Output lines 1 and 3
[root@localhost ~]# awk '(NR%2)==1 {print}' test.txt 
//Output the contents of all odd rows
[root@localhost ~]# awk '(NR%2)==0 {print}' test.txt 
//Output the contents of all even rows
[root@localhost ~]# awk '/^root/{print}' /etc/passwd
//Output lines starting with "root"
[root@localhost ~]# awk '/nologin$/{print}' /etc/passwd
//Output lines ending with "nologin"
[root@localhost ~]# awk 'BEGIN {x=0} ;/\/bin\/bash$/{x++};END {print x}' /etc/passwd
//Count rows ending in/bin/bash
[root@localhost ~]# grep -c "/bin/bash$" /etc/passwd
//Count rows ending in/bin/bash
[root@localhost ~]# awk 'BEGIN{RS=""}; END{print NR}' /etc/squid/squid.conf
//Count the number of space-delimited file paragraphs

Note: Use "BEGIN...END"

(2) Output text by field

[root@localhost ~]# awk '{print $3}' test.txt 
//Output the third field in each row separated by spaces
[root@localhost ~]# awk '{print $1,$3}' test.txt 
//Output the first and third fields in each row separated by spaces
[root@localhost ~]# awk -F ":" '$2==""{print}' /etc/shadow
//Second field in output/etc/shadow file (separated by':') (user with empty password)
[root@localhost ~]# awk 'BEGIN {FS=":"}; $2=""{print}' /etc/shadow
//Second field in output/etc/shadow file (separated by':') (user with empty password)
[root@localhost ~]# awk -F ":" '$7~"/bash"{print $1}' /etc/passwd
//Output is separated by':'and the first field of the row containing'/bash' in the 7th field
[root@localhost ~]# awk '($1~"nfs") && (NF==8) {print $1,$2}' /etc/services
//Output contains eight fields and the first field contains the first and second fields of rows with "nfs"
[root@localhost ~]# awk -F ":" '($7!="/bin/bash") && ($7!="/sbin/nologin") {print}' /etc/passwd
//Output field 7 is neither'/bin/bash'nor'/bin/nologin' for all rows

(3) invoke Shell commands through pipes with double quotes

[root@localhost ~]# awk -F: '/bash$/{print | "wc -l"}' /etc/passwd
//Calling the wc-l command counts the number of users using bash
[root@localhost ~]# grep -c "bash$" /etc/passwd
//Do the same as the last command
[root@localhost ~]# awk 'BEGIN {while ("w" | getline) n++ ; {print n-2}}'
//Call the "w" command and count the number of online users
[root@localhost ~]# awk 'BEGIN { "hostname" | getline ; print $0}'
//Call the "hostname" command and output the current user name

(4) Simple mathematical operations using awk commands

[root@localhost ~]# awk 'BEGIN{ a=6;b=3;print"(a + b)=",(a + b)}'
(a + b)= 9
[root@localhost ~]# awk 'BEGIN{ a=6;b=3;print"(a - b)=",(a - b)}'
(a - b)= 3
[root@localhost ~]# awk 'BEGIN{ a=6;b=3;print"(a / b)=",(a / b)}'
(a / b)= 2
[root@localhost ~]# awk 'BEGIN{ a=6;b=3;print"(a % b)=",(a % b)}'
(a % b)= 0

For more detailed awk commands, you can refer to the blog post: awk learning

_________

Tags: Linux vsftpd shell vim

Posted on Sat, 09 Nov 2019 12:39:54 -0500 by Labbat