Compress a 1G log file. During the whole compression period, gzip and bzip2 can utilize one core to 100%.
First, read the log file into the page cache: you can cat the file or use vmtouch-t to implement it
First check that the file is already in the page cache
[root@er01 ~]# vmtouch /serverInfo_2019-02-11_7.log Files: 1 Directories: 0 Resident Pages: 262145/262145 1G/1G 100% Elapsed: 0.044465 seconds
Default compression ratio - 6
[root@er01 ~]# time gzip -c -6 /serverInfo_2019-02-11_7.log > /tmp/1.gz real 0m32.167s user 0m31.267s sys 0m0.691s [root@er01 ~]# time bzip2 -c -6 /serverInfo_2019-02-11_7.log > /tmp/1.bz real 2m11.389s user 2m10.667s sys 0m0.697s
Check compression ratio
[root@er01 ~]$ ll -h /tmp/1* -rw-rw-r-- 1 root root 190M Mar 11 11:27 /tmp/1.bz -rw-rw-r-- 1 root root 244M Mar 11 11:29 /tmp/1.gz
Slowest compression (compression ratio-9)
[root@er01 ~]# time gzip -c -9 /serverInfo_2019-02-11_7.log > /tmp/1.gz real 1m7.961s user 1m7.119s sys 0m0.739s [root@er01 ~]# time bzip2 -c -9 /serverInfo_2019-02-11_7.log > /tmp/1.bz real 2m23.701s user 2m23.016s sys 0m0.675s
Check compression ratio
[root@er01 ~]$ ll -h /tmp/1* -rw-rw-r-- 1 root root 182M Mar 11 11:27 /tmp/1.bz -rw-rw-r-- 1 root root 240M Mar 11 11:29 /tmp/1.gz
Fastest compression (compression ratio-1)
[root@er01 ~]# time gzip -c -1 /serverInfo_2019-02-11_7.log > /tmp/1.gz real 0m16.090s user 0m15.403s sys 0m0.672s [root@er01 ~]# time bzip2 -c -1 /serverInfo_2019-02-11_7.log > /tmp/1.bz real 1m48.986s user 1m48.012s sys 0m0.878s
Check compression ratio
[root@er01 ~]$ ll -h /tmp/1* -rw-rw-r-- 1 root root 234M Mar 11 11:27 /tmp/1.bz -rw-rw-r-- 1 root root 297M Mar 11 11:29 /tmp/1.gz
conclusion
- In general, the default compression ratio and the best compression ratio (- 6 and - 9): there is no need to increase the compression / decompression time significantly (or even double) in order to increase the compression ratio a little bit (1G of original text files are only a few megabytes worse in the end)
- Between the default compression ratio and the minimum compression ratio (the shortest time consuming): if you can tolerate a lower compression ratio (the final difference between 1G original files is 45 trillion) and pay attention to the time consuming (the compression / decompression time consuming can be greatly reduced or even half), you can choose the best compression ratio to obtain the highest processing efficiency
- The compression ratio of bzip is slightly higher than that of gzip at the same level, but the time consumption is significantly higher than that of gzip
- Under different compression ratios, gzip's time-consuming sensitivity is higher than that of bzip. That is to say, under different compression ratios, gzip's time-consuming increase or decrease is greater than that of bzip (but far less than that of bzip)