1. CPU
cat /proc/cpuinfo # Number of physical CPU s cat /proc/cpuinfo | grep 'physical id' | sort | uniq | wc -l # Number of cores per CPU cat /proc/cpuinfo | grep 'core id' | sort | uniq | wc -l # Logical CPU cat /proc/cpuinfo | grep 'processor' | sort | uniq | wc -l # mpstat mpstat mpstat 2 10
2. Memory
cat /proc/meminfo free -gt df -hT du -csh ./*
Operating system IPC shared memory / queue:
ipcs #(shmems, queues, semaphores)
We often need to monitor the memory usage status. The commonly used commands are free, vmstat, top, dstat -m, etc.
2.1 free
> free -h total used free shared buffers cached Mem: 7.7G 6.2G 1.5G 17M 33M 184M -/+ buffers/cache: 6.0G 1.7G Swap: 24G 581M 23G
Meaning of line data
First line Mem:
- Total: the total memory is 7.7G. The physical memory size is the actual memory of the machine
- Used: 6.2G of memory has been used. This value includes cached and the memory actually used by the application
- Free: free memory 1.5G, unused memory size
- Shared: size of shared memory, 17M
- buffers: memory occupied by buffer, 33M
- cached: the memory occupied by the cache, 184M
Among them are:
total = used + free
The second line - / + buffers/cache represents the memory actually used by the application:
- The previous value indicates used - buffers/cached, which indicates the memory actually used by the application
- The latter value represents free + buffers/cached, which indicates the memory that can be used in theory
You can see that these two values add up to total
The third line, swap, represents the usage of the swap partition: total, used, and unused
Cache cache
Cache represents cache. When the system reads files, it will first read the data from the hard disk to the memory. Because the hard disk is much slower than the memory, this process will be very time-consuming.
In order to improve efficiency, Linux will cache the read files in memory (locality principle). Even if the program ends, the cache will not be released automatically. Therefore, when a program reads a large number of files, it will be found that the memory utilization increases.
When other programs need to use memory, Linux will release these unused caches to other programs according to its own cache strategy (such as LRU). Of course, it can also release the cache manually:
echo 1 > /proc/sys/vm/drop_caches
Buffer buffer
Consider the scenario of writing files from memory to the hard disk, because the hard disk is too slow. If the memory needs to wait for the data to be written before continuing the subsequent operations, the efficiency will be very low and the running speed of the program will be affected. Therefore, there is a buffer.
When the memory needs to write data to the hard disk, it will be put into the buffer first. The memory will quickly write the data to the buffer and can continue other work. The hard disk can slowly read out the data in the buffer in the background and save it, which improves the efficiency of reading and writing.
For example, when copying the files in the computer to the USB flash disk, if the files are very large, sometimes there will be such a situation: it is clear that the files have been copied, but the system will still prompt that the USB flash disk is in use. This is the reason for the buffer: Although the copying program has put the data into the buffer, it has not written all the data to the USB flash disk
Similarly, you can use the sync command to manually flush the contents of the buffer:
> sync --help Usage: sync [OPTION] [FILE]... Synchronize cached writes to persistent storage If one or more files are specified, sync only them, or their containing file systems. -d, --data sync only file data, no unneeded metadata -f, --file-system sync the file systems that contain the files --help display this help and exit --version output version information and exit GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Full documentation at: <http://www.gnu.org/software/coreutils/sync> or available locally via: info '(coreutils) sync invocation'
swap partition
Swap partition is an important concept to realize virtual memory. Swap is to use part of the space on the hard disk as memory. Running programs will use physical memory and put unused memory on the hard disk, which is called swap out. The memory in the hard disk swap partition is put back into the physical memory, which is called swap in.
Swapping partitions can logically expand memory space, but it will also slow down the system because the read and write speed of the hard disk is very slow. The Linux system will put the memory that is not often used into the swap partition.
The difference between cache and buffer
- Cache: as the memory of the page cache, it is the cache of the file system. The data at the file level will be cached in the page cache
- Buffer: as the memory of buffer cache, it is the cache of disk blocks. The data directly operated on the disk will be cached in buffer cache
Simply put, page cache is used to cache file data, and buffer cache is used to cache disk data. If there is a file system, the data will be cached in the page cache when the file is operated. If you directly use dd and other tools to read and write to the disk, the data will be cached in the buffer cache.
2.2 vmstat
Vmstat (virtual memory statistics) is used to make statistics on the overall situation of the system, including the statistics of kernel process, virtual memory, disk, interrupt and CPU activity:
> vmstat --help Usage: vmstat [options] [delay [count]] Options: -a, --active active/inactive memory -f, --forks number of forks since boot -m, --slabs slabinfo -n, --one-header do not redisplay header -s, --stats event counter statistics -d, --disk disk statistics -D, --disk-sum summarize disk statistics -p, --partition <dev> partition specific statistics -S, --unit <char> define display unit -w, --wide wide output -t, --timestamp show timestamp -h, --help display this help and exit -V, --version output version information and exit For more details see vmstat(8). > vmstat -SM 1 100 # 1 indicates refresh interval (seconds), 100 indicates printing times, in MB procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 470 188 1154 0 0 0 4 3 0 0 0 99 0 0 0 0 0 470 188 1154 0 0 0 0 112 231 1 1 98 0 0 0 0 0 470 188 1154 0 0 0 0 91 176 0 0 100 0 0 0 0 0 470 188 1154 0 0 0 0 118 229 1 0 99 0 0 0 0 0 470 188 1154 0 0 0 0 78 156 0 0 100 0 0 0 0 0 470 188 1154 0 0 0 64 84 186 0 1 97 2 0
procs
- Column r: indicates the number of processes running and waiting for CPU time slice. If this value is greater than the number of CPUs for a long time, it indicates that CPU resources are insufficient. You can consider increasing CPU
- Column b: indicates the number of processes waiting for resources, such as I/O or memory exchange
memory
- swpn column: indicates the memory size of the switch partition. If the value of swpd is not 0 or relatively large, and the values of si and so are 0 for a long time, this will not affect the system performance temporarily
- Free column: the current free physical memory size
- buff column: indicates the memory size of buffers cache. Generally, buffering is required for reading and writing to block devices
- Cache column: indicates the memory size of page cache. Generally, it is used as the cache of file system. Frequently accessed files will be cached. If the cache value is large, it indicates that there are a large number of cached files. If the bi in I/O is small at this time, the file system efficiency is better
swap
- si column: indicates swap in, that is, the memory is put into the physical memory by the swap partition
- so column: indicates swap out, that is, put the unused memory into the swap partition of the hard disk
io
- Column bi: indicates the total amount of data read from the block device, that is, the read disk, in KB/s
- Column bo: indicates the total amount of data written to the block device, that is, to the disk, in KB/s
The bi+bo reference value set here is 1000. If it exceeds 1000 and the wa value is relatively large, it indicates that the system disk I/O performance is bottleneck
system
- Column in: indicates the number of device interrupts per second observed in a certain time interval
- cs column: indicates the number of context switches generated per second
The larger the above two values, the more CPU time the kernel consumes
cpu
- us column: indicates the percentage of CPU time consumed by the user process. When the us value is relatively high, it indicates that the user process consumes more CPU time. If it is greater than 50% for a long time, you can consider optimizing the program
- SY column: indicates the percentage of CPU time consumed by the kernel process. When the sy value is high, it indicates that the kernel consumes more CPU time. If the us+sy exceeds 80%, it indicates that the CPU resources are insufficient
- id column: indicates the percentage of time the CPU is idle
- Column Wa: indicates the percentage of CPU time occupied by I/O Wait. The higher the wa value, the more serious the I/O Wait. If the wa value exceeds 20%, it indicates that the I/O Wait is serious
- st column: indicates CPU Steal Time, for virtual machines
3. Network
3.1 interface
ifconfig iftop ethtool
3.2 ports
# port netstat -ntlp # TCP netstat -nulp # UDP netstat -nxlp # UNIX netstat -nalp # Show not only the listening port, but also the connections in other stages lsof -p <PID> -P lsof -i :5900 sar -n DEV 1 # network flow ss ss -s
3.3 tcpdump
sudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -XNnvvv sudo tcpdump -i any -XNnvvv sudo tcpdump -i any udp -XNnvvv sudo tcpdump -i any udp port 20112 -XNnvvv sudo tcpdump -i any udp port 20112 and ip[0x1f:02]=0x4e91 -XNnvvv
3.4 nethogs
Monitor the network traffic of each process
nethogs
4. I/O performance
iotop iostat iostat -kx 2 vmstat -SM vmstat 2 10 dstat dstat --top-io --top-bio
5. Process
top top -H htop ps auxf ps -eLf # Presentation thread ls /proc/<PID>/task
5.1 top
For example, the most commonly used top command:
Help for Interactive Commands - procps version 3.2.8 Window 1:Def: Cumulative mode Off. System: Delay 3.0 secs; Secure mode Off. Z,B Global: 'Z' change color mappings; 'B' disable/enable bold l,t,m Toggle Summaries: 'l' load avg; 't' task/cpu stats; 'm' mem info 1,I Toggle SMP view: '1' single/separate states; 'I' Irix/Solaris mode f,o . Fields/Columns: 'f' add or remove; 'o' change display order F or O . Select sort field <,> . Move sort field: '<' next col left; '>' next col right R,H . Toggle: 'R' normal/reverse sort; 'H' show threads c,i,S . Toggle: 'c' cmd name/line; 'i' idle tasks; 'S' cumulative time x,y . Toggle highlights: 'x' sort field; 'y' running tasks z,b . Toggle: 'z' color/mono; 'b' bold/reverse (only if 'x' or 'y') u . Show specific user only n or # . Set maximum tasks displayed k,r Manipulate tasks: 'k' kill; 'r' renice d or s Set update interval W Write configuration file q Quit ( commands shown with '.' require a visible task display window ) Press 'h' or '?' for help with Windows, any other key to continue
- 1: Displays the usage of each CPU
- c: Displays the full path of the process
- H: Display thread
- P: Sort - CPU usage
- M: Sort - memory usage
- R: Reverse order
- Z: Change color mappings
- B: Disable/enable bold
- l: Toggle load avg
- t: Toggle task/cpu stats
- m: Toggle mem info
us - Time spent in user space sy - Time spent in kernel space ni - Time spent running niced user processes (User defined priority) id - Time spent in idle operations wa - Time spent on waiting on IO peripherals (eg. disk) hi - Time spent handling hardware interrupt routines. (Whenever a peripheral unit want attention form the CPU, it literally pulls a line, to signal the CPU to service it) si - Time spent handling software interrupt routines. (a piece of code, calls an interrupt routine...) st - Time spent on involuntary waits by virtual cpu while hypervisor is servicing another processor (stolen from a virtual machine)
5.2 lsof
lsof -P -p 123
6. Performance test
stress --cpu 8 \ --io 4 \ --vm 2 \ --vm-bytes 128M \ --timeout 60s
time command
7. Users
w whoami
8. System status
uptime htop vmstat mpstat dstat
9. Hardware equipment
lspci lscpu lsblk lsblk -fm # Display file system, permissions lshw -c display dmidecode
10. File system
# mount mount umount cat /etc/fstab # LVM pvdisplay pvs lvdisplay lvs vgdisplay vgs df -hT lsof
11. Kernel and interrupt
cat /proc/modules sysctl -a | grep ... cat /proc/interrupts
12. System log and kernel log
dmesg less /var/log/messages less /var/log/secure less /var/log/auth
13. cron scheduled tasks
crontab -l crontab -l -u nobody # View cron for all users sudo find /var/spool/cron/ | sudo xargs cat
14. Commissioning tools
14.1 perf
14.2 strace
The strace command is used to print system calls and signals:
strace -p strace -p 5191 -f strace -e trace=signal -p 5191 -e trace=open -e trace=file -e trace=process -e trace=network -e trace=signal -e trace=ipc -e trace=desc -e trace=memory
14.3 ltrace
The ltrace command is used to print dynamic link library access:
ltrace -p <PID> ltrace -S # syscall
15. Scenario cases
Scenario 1: after connecting to the server
w # Displays the currently logged in user, login IP, executing process, etc last # See who logged in to the server recently and the server restart time uptime # Startup time, login user, average load history # View history commands
What information does the scenario 2: / proc directory contain
cat /proc/... cgroups cmdline cpuinfo crypto devices diskstats filesystems iomem ioports kallsyms meminfo modules partitions uptime version vmstat
Scenario 3: executing commands in the background
nohup <command> &>[some.log] &
Some commands
# comprehensive top htop glances dstat & sar mpstat # performance analysis perf # process ps pstree -p pgrep pkill pidof Ctrl+z & jobs & fg # network ip ifconfig dig ping traceroute iftop pingtop nload netstat vnstat slurm scp tcpdump # Disk I/O iotop iostat # virtual machine virt-top # user w whoami # Running time uptime # disk du df lsblk # jurisdiction chown chmod # service systemctl list-unit-files # location find locate # performance testing time