Use multi-threads to compress file(s) when tar-ing something

In unix systems, tar is a widely used tool to package and compress files, tar always spends a lot of time on compression, because the programs it used don’t support multi-thread computing, but tar supports we use specified program to compress file(s), which means we can use the programs support multi-thread programming to compress files with higher speed!

From the manual:

-I, –use-compress-program PROG
filter through PROG (must accept -d)

3 Tools for parallel compression I will use today:

  • gz:   pigz
  • bz2: pbzip2
  • xz:   pxz

(Can be easily installed via apt-get in Debian/Ubuntu based linux)

Originally commands to tar with compression:

  • gz:   tar -czf tarball.tgz files
  • bz2: tar -cjf tarball.tbz files
  • xz:   tar -cJf tarball.txz files

Parallel version:

  • gz:   tar -I pigz -cf tarball.tgz files
  • bz2: tar -I pbzip2 -cf tarball.tbz files
  • xz:   tar -I pxz -cf tarball.txz files

I am going to use linux kernel v3.18.6 as sample and threw the whole directory on ramdisk to compress them and compare the difference!
(PS: CPU is Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 4 cores, 4 threads, 16GB ram)

Result comparison:


Time spent:
.                                  gzip         bzip2                    xz
Single-thread       17.466s     50.004s       3m54.735s
Multi-thread           4.623s      13.818s       1m10.181s
How faster ?          3.78x          3.62x                3.34x

Because I didn’t specify the parameter, just let them decide the default compress level, so the space they used may be a little bit different, but we still can add parameter(s) like this:


With parameter -9 to increase the compress level, the result will become 81020940 bytes but not 84479960 bytes, so we save more 3.3 mega bytes! (also spent 40 more secs)

Very useful for me!