Linux text manipulation awk

AWK

Difference between AWK and sed:

  • AWK is more like a scripting language
  • AWK is used for text processing of "comparison specification" and for counting quantity and outputting specified fields
  • Use sed to process non-standard text into "more standard" text

Process control of AWK script

  • Routine BEGIN {}
  • Main input loop {} (usually only the main input loop is written, and in a few cases the begin and end routines are written)
  • All file read completion routine END {}

Field reference and separation of AWK

Records and fields

  • Each line is called an AWK record
  • Words separated by spaces and tabs are called fields
  • You can specify delimited fields yourself

Field reference

  • Each field is represented by $1, $2... $n in awk
    • awk '{print , ,}' filename ({} called main input loop)
  • awk who can change the field separator with the - F option
    • Awk - F ',' {print $1, $2, $3} 'filename (- F specifies that the delimiter is comma)
    • Delimiters can use regular expressions
awk '/^menu/{ print $0 }' /boot/grub2/grub.cfg

awk -F "'" '/^menu/{ print $2 }' /boot/grub2/grub.cfg // Take '' as the delimiter, take the second field and take out the kernel

awk -F "'" '/^menu/{ print x++,$2 }' /boot/grub2/grub.cfg // Set number through x + +

Expression for AWK

  • Assignment operator
  • Arithmetic operator
  • System variable
  • Relational operator
  • Boolean operator

Assignment operator

= Is the most common assignment operator
var1 = "name"
var2 = "hello" "world" // Splicing of two strings
var3 = $1 // Separated first field (different from shell)

Other assignment operators
++ -- += -= *= /= %= ^=

arithmetic operator

+ - * / % ^

System variable

  • FS and OFS, field separator. FS represents the input field separator (what symbol is used as the division symbol for the input file), OFS represents the output field separator.
  • RS, record separator (a record represents a line. For example, if two lines are combined into one line, we can process the RS newline character into other symbols) (the default RS is \ n)
  • NR and FNR, number of lines (NR, the file will not be rearranged when it changes; FNR, the serial number will be rearranged when it changes)
  • NF, the number of fields. The content of the last field can be retrieved with $NF
head -5 /etc/passwd | awk -F ":" '{print $1}' // : separate, output the first field

head -5 /etc/passwd | awk 'BEGIN{FS=":"}{print $1}' // Set BEGIN to output the first field

head -5 /etc/passwd | awk 'BEGIN{FS=":'}{print $1,$2}' // Output the first and second fields. The default separator is a space

head -5 /etc/passwd | awk 'BEGIN{FS=":';OFS="-"}{print $1,$2}' // Set the output separator to '-'

head -5 /etc/passwd | awk 'BEGIN{RS=":"}{print $0}' // Set ':' as a newline character

head -5 /etc/passwd | awk '{print NR}' // set number 

head -5 /etc/passwd | awk '{print NR,$0}' // Display line number and file content

awk '{print FNR,$0}' /etc/hosts /etc/hosts // The serial number will be rearranged

awk '{print NR,$0}' /etc/hosts /etc/hosts // Serial numbers are not rearranged

head -5 /etc/passwd | awk 'BEGIN{FS=":'}{print NF}' // Number of output fields

head -5 /etc/passwd | awk 'BEGIN{FS=":'}{print $NF}' // Output last field

AWK conditions and cycles

Conditional statement

* Conditional statement use if At the beginning, judge which statement to execute according to the result of the expression
if(expression) // If the expression holds, the return value is 1; if the expression does not hold, the return value is 0 (the difference between awk expression and shell)
  awk Statement 1
[else
  awk Statement 2
]
* If more than one statement needs to be executed, you can use{}Enclose multiple statements

[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-p1wputzf-163478143823)( https://note.youdao.com/yws/res/14439/F58C4EC9DAE944AB9FA1F6FEBD052837 )]

awk '{if($2>=80) print $1}' kpi.txt // User with KPI > = 80, output

awk '{if($2>=80) {print $1; print$2}}' kpi.txt  // Both the user and the kpi will be output (in parentheses)

Circular statement

* while loop
    while((expression)
        awk Statement 1
* do loop
    do {
        awk Statement 1
    }while(expression)
* for loop
    for(Initial value; cyclic judgment condition; accumulation)
        awk Statement 1
* Other statements that affect control
    break
    continue
    
head -1 kpi.txt

head -1 kpi.txt | awk '{for(c=2;c<=NF;c++) print c}'

head -1 kpi.txt | awk '{for(c=2;c<=NF;c++) print $c}'

head -1 kpi.txt | awk '{for(c=2;c<=NF;c++) sum+=$c ; print sum}' // Total performance

head -1 kpi.txt | awk '{for(c=2;c<=NF;c++) sum+=$c; print sum/(NF-1)}' // Calculate the average performance. / / note that sum will not be destroyed, and the previous value is still there

heada -1 kpi.txt | awk '{sum=0; for(c=2;c<=NF;c++) sum+=$c ; print sum/(NF-1)}' kpi.txt // Average all

Array of AWK

  • Definition of array
  • Traversal of array
  • Delete array
  • Command line parameter array

array define

Array name[subscript]=value

Subscripts can be numbers or strings

Traversal of array

for(variable in Array name)

Use array name[variable]Operate on the elements of each array in turn

Delete array

delete array // You don't want the entire array

delete array[subscript] // Do not want an element in the array
awk '{ sum=0; for(column=2;column<=NF;column++) sum+=$column; average[$1]=sum/(NF-1)}END{ for( user in average) print user,average[user]}' kpi.txt  // Take the average value of each person

awk '{ sum=0; for(column=2;column<=NF;column++) sum+=$comumn; average[$1]=sum/(NF-1)}END{ for(user in average) sum2+=average[user] ;print sum2/NR}' kpi.txt // Take out the average score of everyone

vim avg.awk  // Save awk script file

awk -f avg.awk kpi.txt  // Load awk script

Command line parameter array

ARGC  // Number of command line arguments followed by awk

ARGV // Specific content of each parameter
vim arg.awk

BEGIN{
    for(x=0;x<ARGC;x++)
        print ARGV[x]
    print ARGC
}

awk -f arg.awk 11 22 33
vim result.awk

{
sum = 0
for(column = 2; column <=NF ;column++)
    sum += $column
average[$1] = sum/(NF-1)

if( average[$1] >=80) 
    letter = "S"
else if(average[$1]>=70)
    letter = "A"
else if(average[$1] >= 60)
    letter = "B"
else
    letter = "C"
    
print $1,average[$1]
letter_all[letter]++
}
END{
for(user in average)
    sum_all += average[user]
avg_all=sum_all/NR
print "average all:",avg_all
for( user in average) 
    if(average[user] > avg_all)
        above++
    else 
        below++
print "above",above
print "below",below
print "S:",letter_all["S"]
print "A:",letter_all["A"]
print "A:",letter_all["B"]
print "A:",letter_all["C"]
}
awk -f result.awk kpi.txt

Function of AWK

  • Arithmetic function
  • String function
  • Custom function

Arithmetic function

int() // Take integer
awk 'BEGIN{pi=3.14 ;print int(pi) }'

rand() // Take random tree, between 0-1
awk 'BEGIN{print rand()}' // Pseudo random number
awk 'BEGIN{srand();print rand()}' // random number

String function

gsub(r,s,t)

index(s,t)

length(s)

match(s,r)

split(s,a,sep)

sub(r,s,t)

substr(s,p,n)

man awk
/gsub

Custom function

function Function name (parameter){
    awk sentence
    return awk variable
}
awk 'function a(){return 0} BEGIN{print a()}' // Custom functions are written outside the main input loop

awk 'function double(str) {return str str} BEGIN {print double("hello awk")}'


Tags: Linux Operation & Maintenance regex

Posted on Fri, 22 Oct 2021 02:06:21 -0400 by MacGod