Five loads of Linux command system (monitoring script and problem explanation)

1, I/O


1. Monitoring script

echo "define equipment(/dev/sda)Statistics for"

# The number of read requests made to the device per second

disk_sda_rs=`iostat -kx | grep sda| awk '{print $4}'`

echo "Number of read requests to devices per second:"$disk_sda_rs

# The number of write requests to the device per second

disk_sda_ws=`iostat -kx | grep sda| awk '{print $5}'`

echo "Number of write requests to devices per second:"$disk_sda_ws

# Average queue length of I/O requests to devices

disk_sda_avgqu_sz=`iostat -kx | grep sda| awk '{print $9}'`

echo "Device initiated I/O Average request queue length"$disk_sda_avgqu_sz

# Average time per I/O request to device

disk_sda_await=`iostat -kx | grep sda| awk '{print $10}'`

echo "Every time a message is sent to the device I/O Average request time:"$disk_sda_await

# Average time of I/O service initiated to the device

disk_sda_svctm=`iostat -kx | grep sda| awk '{print $11}'`

echo "Device initiated I/O Average service time:"$disk_sda_svctm

# Percentage of CPU time to initiate I/O requests to devices

disk_sda_util=`iostat -kx | grep sda| awk '{print $12}'`

echo "Initiate to device I/O Requested CPU Percentage of time:"$disk_sda_util

2. How to solve the abnormally high I / O

① IO usage positioning

② top queries the usage of wa (i/o) to see if io occupies a high cpu.

③ iostat -x 2 5 check the% util status of each disk. The higher the value, the higher the corresponding io of the disk.

④ Iotop directly checks the process number with higher io. When the iotop command is unavailable, use for X in SEQ 1 10; do ps -eo state,pid,cmd | grep “^D”; echo “----”; sleep 5; done query the corresponding process waiting for IO

⑤ cat /proc/pid/io view the io used by the corresponding process

⑥ lsof -p pid or ls /proc/pid/fd view the corresponding application directory

⑦ fd /tmp view the corresponding volume name

⑧ Yes, fdisk -l or pvdisplay. Check the corresponding disk to confirm whether it is consistent with the disk with high io.

2, Network traffic

1. Monitoring script

read -p 'Please enter the network interface: ' INTER    #(please enter the network interface when using read interaction)
ifconfig $INTER &> /dev/null     #(check the network interface information and save the contents of standard output and standard error into the black hole)
if [ $? -ne 0 ];then     #(when the return value is not equal to 0)
   echo "This network interface does not exist!"
   exit     #(exit)

while true     #(use the while statement to execute repeatedly)
   RX_before=`ifconfig $INTER | awk 'NR==5{print $5}'`     #(define the downlink traffic 1 second ago, in row 5 and column 5 of ens33)
   TX_before=`ifconfig $INTER | awk 'NR==7{print $5}'`     #(define the uplink traffic 1 second ago, in row 7 and column 5 of ens33)
   sleep 1     #(interval 1 second)
   RX_after=`ifconfig $INTER | awk 'NR==5{print $5}'`      #(define the downlink traffic after 1 second, in row 5 and column 5 of ens33)
   TX_after=`ifconfig $INTER | awk 'NR==7{print $5}'`      #(define the uplink traffic after 1 second in row 7 and column 5 of ens33)
   RX=$((RX_after-RX_before))     #(define download speed: subtract "downstream traffic before 1 second" from "downstream traffic after 1 second")
   TX=$((TX_after-TX_before))     #(define the upload speed: "uplink traffic after 1 second" minus "uplink traffic before 1 second")
   clear     #(cleaning)
   echo -e "Network monitoring time:`date +%Y year%m month%d day%T`"     #(echo network monitoring time)

   if [ $RX -le 1024 ];then     #(when the download speed is less than 1024B, execute the following command)
      echo "Download speed: $[$RX]B/s"
   elif [[ $RX -ge 1024 && $RX -lt 1048576 ]];then     #(when the download speed is greater than or equal to 1KB and less than 1M, execute the following command)
      echo "Download speed: $[$RX/1024]KB/s"
   elif [[ $RX -ge 1048576 && $RX -lt 20971520 ]];then     #(when the download speed is greater than or equal to 1M and less than 20M, execute the following command)
      echo "Download speed: $[$RX/1048576]MB/s"
   else     #(if the download speed is greater than or equal to 20M and a warning is given, execute the following command)
      echo "Warning: the download speed is too high, there is a malicious attack!" >> /home/WL.txt
   if [ $TX -lt 1024 ];then     #(when the upload speed is less than 1024B, execute the following command)
      echo "Upload speed: $[$TX]B/s"
   elif [[ $TX -ge 1024 && $TX -lt 1048576 ]];then     #(when the upload speed is greater than or equal to 1KB and less than 1M, execute the following command)
      echo "Upload speed: $[$TX/1024]KB/s"
   elif [[ $RX -ge 1048576 && $RX -lt 10485760 ]];then     #(when the download speed is greater than or equal to 1M and less than 10M, execute the following command)
      echo "Upload speed: $[$RX/1048576]MB/s"
   else     #(if the upload speed is greater than or equal to 10M and a warning is given, execute the following command)
      echo "Warning: the upload speed is too high, there is a malicious attack!" >> /home/WL.txt

2. How to solve abnormal network traffic

Using iftop tool combined with iptables service under Linux to solve the problem that bandwidth resources are full of malicious requests, mainly through two steps;

  1. Use the iftop tool to find out which hosts are requesting bandwidth resources and find out the culprit of bandwidth consumption
  2. Find out the or segments that consume bandwidth, analyze whether the out direction or in direction, and use iptables rules to control

The detailed operation methods are as follows;

  • Once the bandwidth is maliciously requested, it is basically difficult to log in to the server through the network for operation and maintenance when the bandwidth is full. At this time, we need to log in to the system through the "connection management terminal" service provided

  • Generally, it is recommended to install the iftop tool directly inside the server when the host is normal, so that the tool can be directly used for troubleshooting in case of malicious requests. The following describes the installation method of iftop

  • 1. Install iftop tool using yum or up2date
    It is easy to install using yum. Just execute the yum install iftop – y command directly. If there is no problem, the system will automatically execute the installation. However, it may not be possible to install using yum. In this case, you need to compile and install

  • 2. Compile and install iftop tool
    ① Download the source package of iftop tool;

    ② Dependent packages required for CentOS installation
    yum install flex byacc libpcap ncursesncurses-devel libpcap-devel
    ③ Downloaded iftop file
    tarzxvf iftop-0.17.tar.gz
    ④ Enter the extracted iftop directory
    Configure and specify the installation directory as / usr/local/iftop
    ⑤/configure –prefix=/usr/local/iftop
    ⑥ Compile and install
    make && make install
    After installation, directly use / usr/local/iftop/sbin/iftop to start the iftop program to check the traffic usage. If you want to start the program directly by using iftop, you need to add the iftop program to the
    Use iptables service to limit the traffic of malicious requests;
    iftop – i eth1 view the traffic usage of eth1 external network card

  • It is clear from the view that the server 121.199 has been sending traffic to the address, and the outgoing traffic is quite large, almost exhausting the entire outgoing bandwidth

  • After finding out the cause of the malicious request and the target host, we can use the iptables service to limit this malicious behavior. Because from the viewed data, the main traffic goes out from the out direction, so we can directly set the policy in the out direction

  • Iptables -A OUTPUT -d –j REJECT
    It may also be found here that after disabling this IP, the rest of this segment may receive the continue request immediately, so you can limit it for a segment

  • iptables-A OUTPUT -d -j REJECT
    After the policy is added, you can use iftop – i eth1 to view the traffic requests;

  • It can be seen that the traffic has returned to normal, and the addresses of previous malicious requests have been shielded by the firewall, with good results
    In addition, iftop has many parameters that can realize more functions. If you have time, you can study it, which is very helpful for troubleshooting * * * and controlling traffic

3, Hard disk

1. Monitoring script

dis=$(df -h|awk -F '[ %]+'  '/\/$/{print $5}')
    if [ $dis -ge 80 ];then
       echo "Disk utilization over 80%,Please note that!"
    else [ $dis -lt 80 ]
 echo "Disk utilization does not exceed 80%,Normal operation"

2. How to solve df-h when the hard disk is full

① Determine if there is really insufficient disk space

Enter the command: df – lh view disk information

Obviously, the 40G capacity under the mount point / dev/xvda1 under the Filesystem has been exhausted

Now that the problem has been determined, the next step is to deal with it

The processing method is also very simple, that is, delete files

Q: what files are deleted?

A: delete files that take up a lot of disk space but are useless

Q: what is a useless file?

A: if you are not familiar with the system, logging may be your first goal

② How to locate the maximum file directory

Enter the command: cd / enter the root directory

Enter the command: Du - h -- max depth = 1 to find the current directory. Which folder takes up the most space

After the above two commands, you can see that / usr this path takes up a large amount of disk space and 21G. Sharp eyed students may see that the last item shows 24G, which means that the total disk space occupied by all files in the current directory is 24G

Do the same. After several judgments, locate the tomcat log file

③ How to locate the largest file

Enter the command: ls – lhS to display the files in descending order

The final location file is catalina.out, the log file

④ Confirm that the file is not occupied

Anyone will delete the file: rm – f catalina.out, but it's best to confirm whether to download it for developers to analyze the log

After a hard time, I found the file and deleted it. I was very happy. I entered the command: df – lh and found that the disk space was still full. It's no different from the beginning. Is it necessary to restart the system? Will linux be so unfriendly?

In Linux or Unix systems, deleting a file through rm or file manager will unlink from the directory structure of the file system. However, if the file is open (a process is using it), the process will still be able to read the file and the disk space will be occupied all the time.

Enter the command: / usr/sbin/lsof|grep deleted to confirm whether the deleted file is occupied

It is indeed occupied. According to the pid provided in the second column, enter the command: kill -9 13117 to kill the process.

Enter the command again: df – lh

⑤ The hard disk is full. The space is not released after deleting large files

① Use lsof | grep deleted to view and find that the deleted file is still occupied
② Use kill -9 41895 to terminate the above processes, and then use df -h to check the space. It is found that the space has been completely released

4, cpu

1. Monitoring script

x=$(top -b -n 1 | grep Cpu | awk -F ',' '{print $4}' | awk '{print $1*10}')
    if [ $x -le 200 ];then
         echo "current CPU Utilization rate over 80%%Please note that"
  else [ $x -ge 200 ]
         echo "current CPU The utilization rate does not exceed 80%%Normal operation"

2. How to solve when the CPU is full

  1. Use the top command to query the cpu usage, and use P to sort by cpu used

  2. ps -aux view cpu usage

  3. Use lsof -p to correspond to the process number or - i to correspond to the port number to view the files that make the cpu too high

  4. Kill kill useless processes

Situation II

Check whether there are ddos attacks or dns spoofing attacks

5, Memory

1. Monitoring script

#Desc: detect the memory usage of the system, and alarm the administrator if it exceeds 80%

total=$(free -m |awk 'NR==2{print $2}')
used=$(free -m |awk 'NR==2{print $3}')

if [ $syl -gt 80 ];then
        SUBJECT="ATTENTION: Memory Utilization is High on $(hostname) at $(date)"
        echo "Memory Current Usage is: $syl%" >> $MESSAGE
        echo "" >> $MESSAGE
        echo "------------------------------------------------------------------" >> $MESSAGE
        echo "Top Memory Consuming Process Using top command" >> $MESSAGE
        echo "------------------------------------------------------------------" >> $MESSAGE
        echo "$(top -b -o +%MEM | head -n 20)" >> $MESSAGE
        echo "" >> $MESSAGE
        echo "------------------------------------------------------------------" >> $MESSAGE
        echo "Top Memory Consuming Process Using ps command" >> $MESSAGE
        echo "------------------------------------------------------------------" >> $MESSAGE
        echo "$(ps -eo pid,ppid,%mem,%Memory,cmd --sort=-%mem | head)" >> $MESSAGE
        mail -s "$SUBJECT" "$TO" < $MESSAGE
        echo "ATTENTION: Memory Utilization is ok on $(hostname) at $(date)" 

2. How to solve the problem of free -m when the memory is full

Use shell script to monitor memory usage and send email alarm

yum -y install mailx
vim /etc/mail.rc # add configuration
set smtp-auth-password=dkgcnzaeqmhmbjdi
set smtp-auth=login vim
#The mailbox is defined by itself
memin=cat /proc/meminfo | grep "MenFree" | awk '{print $2}'
if [[ $memin -ge 0 ]];then
echo "memory utilization exceeds 90%" | mail -s "memory" $a
echo "ok"
#Because of the test, my parameter is set to 0

  • Under Linux system, we generally do not need to release memory, because the system has managed the memory well. However, there are exceptions to everything. Sometimes the memory will be occupied by the cache, resulting in the system using SWAP space and affecting the performance. At this time, it is necessary to free the memory (clean up the cache)

  • To release the cache, we first need to understand the key configuration file / proc/sys/vm/drop_caches. This file records the parameters of cache release. The default value is 0, that is, the cache is not released. Its value can be any number between 0 and 3, representing different meanings:

    0 – do not release
    1 – free page cache
    2 – release dentries and inodes
    3 – release all caches

  • After knowing the parameters, we can use the following instructions to operate according to our needs

  • First, we need to use the sync instruction to write all the unwritten system buffers to the disk, including the modified i-node, delayed block I/O and read-write mapping file. Otherwise, unsaved files may be lost during the process of releasing the cache

  • #sync

  • Next, we need to write the required parameters into / proc / sys / VM / drop_ In the caches file, for example, if we need to release all caches, enter the following command:

    #echo 3 > /proc/sys/vm/drop_caches

    This instruction will take effect immediately after it is entered. You can query that there is obviously more available memory now.

    To query the parameters of the current cache release, you can enter the following instruction:

    #cat /proc/sys/vm/drop_caches

1. Memory usage before cleaning
free -m

2. Start cleaning
echo 1 > /proc/sys/vm/drop_caches

3. Memory usage after cleaning
free -m

4. Complete!

To view the number of memory modules:

dmidecode | grep -A16 "Memory Device$"

6, Common commands for viewing system hardware resources

Scenario: I will use the above commands during patrol inspection to detect the system / server load and output it to the document for my usual patrol inspection

1. Disk Occupation: df -hT
Memory information: free -m or cat /proc/meminfo
CPU information: cat /proc/cpuinfo
I/O information: iostat iotop get socket statistics, similar to netstat
It can display more and more detailed information about TCP and connection status, which is faster and more efficient than netstat
netstat -nautp | grep [...]
ss -nautp | grep [...] '/ / port occupancy can be detected

3.iotop: it is a top class tool used to monitor disk I/O usage
Information about the disk IO used by which program can be monitored (yum -y install iotop)

4.lsof: used to view the file opened by the process, the process opening the file, and the port opened by the process (TCP/UDP)
It is a very convenient system monitoring tool. Because lsof command needs to access core memory and various files, it needs to be executed by root user

Tags: Linux

Posted on Sun, 24 Oct 2021 11:59:14 -0400 by nicandre