X
Popular Searches

How to Compress and Decompress Files Using tar in Linux

Shutterstock/iunewind

Tar is more then just an archiving utility: tar comes with some great builtin features, which let you compress and decompress files, at the same time as archiving them. Learn all about it in this article and more!

What is tar and How Do I Install it?

As per the tar manual (which you can access by typing man tar once it is installed), tar is an archiving utility. It supports many features, including compressing and decompressing files on the fly when archiving them. Let’s get started by installing tar:

To install tar on your Debian/Apt based Linux distribution (Like Ubuntu and Mint), execute the following command in your terminal:

sudo apt install tar

To install tar on your RedHat/Yum based Linux distribution (Like RHEL, Centos and Fedora), execute the following command in your terminal:

sudo yum install tar

Next, we will create some sample data:

mkdir test; cd test
touch a b c d e f 
echo 1 > a; echo 5 > e; echo '22222222222222222222' > b

Setting up sample data to compress

Here we created a directory test, and created six empty files in it by using the touch command. We also added some numbers to files a, e, and b, though notably file b has repetitive data, which will compress well.

Advertisement

If you would like to learn more about how compression works, you can checkout our How Does File Compression Work? article.

Creating an Uncompressed Archive

Simple uncompressed tar archive creation

tar -hcf all_files.tar *
ls -l | grep -v total | awk '{print $5"\tbytes for: "$9}' | sort -n

Here we created an uncompressed archive using the tar -hcf all_files.tar * command. Let’s have a look at the options used in this command.

Firstly, we have -h which though not required in this particular case, I highly recommend to always include in your tar commands. This option stands for dereference, which will dereference (or follow) symlinks, archiving and dumping the files they point to.

Next we have the -c and -f options. Note that they are just written together with the - in -h, i.e. instead of specifying another -, we simply tag them onto the other shorthand options. Quick and easy.

The -c option stand for create a new archive. Note that by default directories are archived recursively, unless a –no-recursion option is also used. The -f option allows us to specify the name of the archive. It thus has to come last in our option chain (as it requires an option) so we can add the archive file name directly behind it. Using tar -fch test.tar * will not work:

Shorthand options that require an option cannot be placed at front

After the tar is generated, we use a modified ls output which clearly shows us the number of bytes per file. As you can see, the tar file is much larger then all of our files combined. The files are simply being archived and some overall overhead for tar is being added.

Advertisement

As an interesting sidenote, we can also see what types of files were are working with by simply using the file command at the command prompt:

file c
file b
file all_files.tar

Using file to see the file type

Creating an Uncompressed Archive

A very common compression algorithm is GZIP. Let’s add the option for the same (-z) to our chain of shorthand command line options and see how this affects the file size:

tar -zhcf all_files.tar.gz [a-f]
ls -l | grep -v total | awk '{print $5"\tbytes for: "$9}' | sort -n

Looking at the size of a compressed archive vs an uncompressed one

This time we specified a regular expression to use only the files with name a to f, preventing the tar command from including the all_files.tar file inside the new all_files.tar.gz file!

See How Do You Actually Use Regex? and Modify Text Using Regular Expressions using sed if you like to learn more about regular expressions.

We also included the -z option which will use GZIP compression to compress the resulting .tar file once the dumping of data into it is complete. It is great to see that we end up with a 186 byte file, which tells us that – in this case – the tar header / overhead of about 10Kb can be compressed very well.

The total size of the archive is 7.44 times larger then the total file size, but it matters little as this fictive example is not representative of compressing large files where gains instead of losses are almost always seen, unless the data was pre-compressed or is of such a format that it cannot be condensed easily using a variety of algorithms. Still, one algorithm (like the GZIP one) may be better then another (like for example BZIP2), and vice versa, for different data sets.

Gaining More Bytes Using High Level Compression

Can we make the file even smaller? Yes. We can set the maximum compression option of GZIP by using the -I option to tar which lets us specify a compression program to use (with thanks to stackoverflow user ideasman42):

tar -I 'gzip -9' -hcf all_files.tar.gz [a-f]
ls -l | grep -v total | awk '{print $5"\tbytes for: "$9}' | sort -n

Using the -I option to tar to specify a compression program

Advertisement

Here we specified -I 'gzip -9' as the compression program to use, and we dropped the -z option (as we are now specifying a specific custom program to use instead of using the built-in tar GZIP configuration). The result is that we 12 bytes less as a result of a better (but generally slower) compression attempt (at level -9) by GZIP.

Generally speaking, the faster the compression (lower level of compression attempts, i.e. -1), the more file size. And, the slower the compression (higher level of compression attempts, i.e. -9), the smaller the file. You can set your own preference by varying the compression level from -1 (fast) to -9 (slow)

Other Compression Programs

There are two other common compression algorithms which one may explore and test (different algorithm options also give different sizing outcomes, and may have additional compression options), and that is bzip2, which can be used by specifying the -j option to tar, and XZ which can be used by specifying the -J option.

Alternatively, you can use the -I command to set maximum compression options for bzip2 (-9):

bzip -9 compression program example

And -9e for xz:

xz -9e compression program example

As you can see, the results are less good in this case then using the somewhat standard GZIP algorithm. Still, the bzip2 and xz algorithms may show improvements with other data sets.

Decompressing a File

Decompressing a file is super easy, whatever the original method was to compress it, and provided that such compression algorithm is present on your computer. For example, if the original compression algorithm was bzip2 (indicated by a .bz2 extension to the tar filename), then you will want to have done sudo apt install bzip2 (or sudo yum install bzip2) on your target computer which is to decompress the file.

rm a b c d e f
tar -xf all_files.tar.gz
ls

Decompression a compressed (or uncompressed) tar archive

Advertisement

We simply specify -x to expand or decompress our all_files.tar.gz file, and indicate what the filename is by again using the -f shorthand option as before.

Compressing files can help you save a lot of room on your storage devices, and knowing how to use tar in combination with available compression options will help you to do so. Once the archive needs to be extracted again, it is easy to do so provided the correct decompression software is available on the computer used to decompress or extract the data from your archive. Enjoy!

Roel Van de Paar Roel Van de Paar
Roel has 25 years of experience in IT & business, 9 years of leading teams, and 5 years in hiring & building teams. He worked for companies like Oracle, Volvo, Sun, Percona, Siemens, Karat, and now MariaDB in various senior, principal, lead, and managerial roles. Read Full Bio »

The above article may contain affiliate links, which help support CloudSavvy IT.