Working with Compressed files

As previously mentioned, genomics data files tend to be large. Since larger files are slower and more costly to move around, you will often encounter files that have been compressed to save time/space/money. The two most commonly encountered types of compressed files are Zip archives (e.g. filename.zip), Gzip archives (e.g. filename.gz) and Tarballs (e.g. filename.tar or filename.tar.gz).

Once you've convinced yourself that the file you have is the file that you ought to have, the next thing that you'll want to do is unzip it (a.k.a. uncompress or decompress or extract). You can unzip your .zip archive using the unzip program:

unzip <filename.zip>

If you don't want to extract everything, but rather check the contents, you can view what a zip contains using the -l flag ('list'):

unzip -l <filename.zip>

When you want to go in the other direction and make your own archive the command is simply zip. It works like this:

zip <mynewarchive.zip> <myfirstfile.txt> <mysecondfile.sam>

Note that you can also use the -r flag (recursive) to zip up a folder and all its contents, including subfolders like so:

zip -r <myproject.zip> myproject/

If you have been sent a big bundle of data as a gzip archive, then happily the same procedure applies for viewing and extracting as with zip archives, but with the gunzip program:

gunzip -l <bundle.gz>
gunzip <bundle.gz>

Things are slightly different (read 'complex') if you encounter a tarball: thisfile.tar or thatfile.tar.gz or tacofile.tgz.

You can view the contents of tarballs using the tar program:

tar -tf <thisfile.tar>
tar -ztvf <thatfile.tar.gz>
tar -ztvf <tacofile.tgz>

...and extract them like this:

tar -xf <thisfile.tar>
tar -zxvf <thatfile.tar.gz>
tar -zxvf <tacofile.tgz>

Other types of compressed files and archives do exist, but these are the most common.

:ref:`03_the_filesystem`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02_working_with_archives.rst

02_working_with_archives.rst

Working with Compressed files

Files

02_working_with_archives.rst

Latest commit

History

02_working_with_archives.rst

File metadata and controls

Working with Compressed files