Use multi-threads to compress file(s) when tar-ing something

In unix systems, tar is a widely used tool to package and compress files, tar always spends a lot of time on compression, because the programs it used don’t support multi-thread computing, but tar supports we use specified program to compress file(s), which means we can use the programs support multi-thread programming to compress files with higher speed!

From the manual:

-I, –use-compress-program PROG
filter through PROG (must accept -d)

3 Tools for parallel compression I will use today:

  • gz:   pigz
  • bz2: pbzip2
  • xz:   pxz

(Can be easily installed via apt-get in Debian/Ubuntu based linux)

Originally commands to tar with compression:

  • gz:   tar -czf tarball.tgz files
  • bz2: tar -cjf tarball.tbz files
  • xz:   tar -cJf tarball.txz files

Parallel version:

  • gz:   tar -I pigz -cf tarball.tgz files
  • bz2: tar -I pbzip2 -cf tarball.tbz files
  • xz:   tar -I pxz -cf tarball.txz files

I am going to use linux kernel v3.18.6 as sample and threw the whole directory on ramdisk to compress them and compare the difference!
(PS: CPU is Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz, 4 cores, 4 threads, 16GB ram)

Result comparison:

tarCompressComparison1

Time spent:
.                                  gzip         bzip2                    xz
Single-thread       17.466s     50.004s       3m54.735s
Multi-thread           4.623s      13.818s       1m10.181s
How faster ?          3.78x          3.62x                3.34x

Because I didn’t specify the parameter, just let them decide the default compress level, so the space they used may be a little bit different, but we still can add parameter(s) like this:

tarCompressComparison2

With parameter -9 to increase the compress level, the result will become 81020940 bytes but not 84479960 bytes, so we save more 3.3 mega bytes! (also spent 40 more secs)

Very useful for me!

一口氣解壓縮目錄底下的所有封裝壓縮檔(*.tar.gz)

如果想一次把目錄底下的.tar.gz或.tar.?z等封裝壓縮檔解壓縮,直接這樣做會出問題:
tar -xvf *.tar.gz

螢幕就會開始跳:

tar: a.tar.gz: Not found in archive
tar: b.tar.gz: Not found in archive
tar: c.tar.gz: Not found in archive
tar: d.tar.gz: Not found in archive
tar: e.tar.gz: Not found in archive
tar: Error exit delayed from previous errors.

結果是要這樣才行:

for a in ls -1 *.tar.gz; do tar -xvf $a; done