Comparison of compression ratio and compression time between gzip and bzip

Compress a 1G log file. During the whole compression period, gzip and bzip2 can utilize one core to 100%.

First, read the log file into the page cache: you can cat the file or use vmtouch-t to implement it

First check that the file is already in the page cache

[root@er01 ~]# vmtouch /serverInfo_2019-02-11_7.log
           Files: 1
     Directories: 0
  Resident Pages: 262145/262145  1G/1G  100%
         Elapsed: 0.044465 seconds

Default compression ratio - 6

[root@er01 ~]# time gzip -c -6 /serverInfo_2019-02-11_7.log > /tmp/1.gz

real    0m32.167s
user    0m31.267s
sys 0m0.691s
[root@er01 ~]# time bzip2 -c -6 /serverInfo_2019-02-11_7.log > /tmp/1.bz

real    2m11.389s
user    2m10.667s
sys 0m0.697s

Check compression ratio

[root@er01 ~]$ ll -h /tmp/1*
-rw-rw-r-- 1 root root 190M Mar 11 11:27 /tmp/1.bz
-rw-rw-r-- 1 root root 244M Mar 11 11:29 /tmp/1.gz

Slowest compression (compression ratio-9)

[root@er01 ~]# time gzip -c -9 /serverInfo_2019-02-11_7.log > /tmp/1.gz

real    1m7.961s
user    1m7.119s
sys 0m0.739s
[root@er01 ~]# time bzip2 -c -9 /serverInfo_2019-02-11_7.log > /tmp/1.bz

real    2m23.701s
user    2m23.016s
sys 0m0.675s

Check compression ratio

[root@er01 ~]$ ll -h /tmp/1*
-rw-rw-r-- 1 root root 182M Mar 11 11:27 /tmp/1.bz
-rw-rw-r-- 1 root root 240M Mar 11 11:29 /tmp/1.gz

Fastest compression (compression ratio-1)

[root@er01 ~]# time gzip -c -1 /serverInfo_2019-02-11_7.log > /tmp/1.gz

real    0m16.090s
user    0m15.403s
sys 0m0.672s
[root@er01 ~]# time bzip2 -c -1 /serverInfo_2019-02-11_7.log > /tmp/1.bz

real    1m48.986s
user    1m48.012s
sys 0m0.878s

Check compression ratio

[root@er01 ~]$ ll -h /tmp/1*
-rw-rw-r-- 1 root root 234M Mar 11 11:27 /tmp/1.bz
-rw-rw-r-- 1 root root 297M Mar 11 11:29 /tmp/1.gz

conclusion

  1. In general, the default compression ratio and the best compression ratio (- 6 and - 9): there is no need to increase the compression / decompression time significantly (or even double) in order to increase the compression ratio a little bit (1G of original text files are only a few megabytes worse in the end)
  2. Between the default compression ratio and the minimum compression ratio (the shortest time consuming): if you can tolerate a lower compression ratio (the final difference between 1G original files is 45 trillion) and pay attention to the time consuming (the compression / decompression time consuming can be greatly reduced or even half), you can choose the best compression ratio to obtain the highest processing efficiency
  3. The compression ratio of bzip is slightly higher than that of gzip at the same level, but the time consumption is significantly higher than that of gzip
  4. Under different compression ratios, gzip's time-consuming sensitivity is higher than that of bzip. That is to say, under different compression ratios, gzip's time-consuming increase or decrease is greater than that of bzip (but far less than that of bzip)

Tags: Linux less

Posted on Mon, 02 Dec 2019 21:09:16 -0500 by Zepo.