In unix systems, tar is a widely used tool to package and compress files, tar always spends a lot of time on compression, because the programs it used don’t support multi-thread computing, but tar supports we use specified program to compress file(s), which means we can use the programs support multi-thread programming to compress files with higher speed!
From the manual:
-I, –use-compress-program PROG
filter through PROG (must accept -d)
3 Tools for parallel compression I will use today:
- gz: pigz
- bz2: pbzip2
- xz: pxz
Originally commands to tar with compression:
- gz: tar -czf tarball.tgz files
- bz2: tar -cjf tarball.tbz files
- xz: tar -cJf tarball.txz files
- gz: tar -I pigz -cf tarball.tgz files
- bz2: tar -I pbzip2 -cf tarball.tbz files
- xz: tar -I pxz -cf tarball.txz files
I am going to use linux kernel v3.18.6 as sample and threw the whole directory on ramdisk to compress them and compare the difference!
(PS: CPU is Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 4 cores, 4 threads, 16GB ram)
. gzip bzip2 xz
Single-thread 17.466s 50.004s 3m54.735s
Multi-thread 4.623s 13.818s 1m10.181s
How faster ? 3.78x 3.62x 3.34x
Because I didn’t specify the parameter, just let them decide the default compress level, so the space they used may be a little bit different, but we still can add parameter(s) like this:
-9 to increase the compress level, the result will become 81020940 bytes but not 84479960 bytes, so we save more 3.3 mega bytes! (also spent 40 more secs)
Very useful for me!