Tar is a utility for archiving and compressing files in Linux and UNIX-like operating systems. It allows you to bundle multiple files and directories into a single .tar archive file, while preserving permissions and directory structures.
Tar is extremely useful for backing up data, moving groups of files between systems, and preparing source code distributions. In this comprehensive guide, we will cover the basics of using tar, as well as some more advanced features and examples.
An Overview of Tar
Tar stands for Tape ARchive. It was originally used to archive data onto tape drives for long-term storage. While tar can still write archives to tape drives, it is more commonly used with regular files and pipes today.
Some key facts about tar:
- Tar archives bundle multiple files and directories into a single .tar file.
- Archives can be compressed using gzip or bzip2 to save space.
- Tar preserves permissions, ownership, file modification times, etc.
- Archives can span multiple tapes/volumes (for backing up to tape drives).
- Tar is standardized – archives created on one UNIX system can be extracted on any other compatible UNIX system.
- Tar is a very old utility that exists on every Linux and UNIX platform. It has decades of widespread use which makes it very stable and reliable.
Let’s go over some common tar terminology:
- Archive – The .tar file created by bundling files using tar. This file contains the archived content. Archives can be compressed using gzip/bzip2 which are signified by .tar.gz or .tar.bz2 file extensions.
- Bundle – Synonym for archive.
- Tarball – A tarball is also just another way to refer to a .tar.gz or .tar.bz2 archive file.
- Extract – The process of unbundling an archive and writing the extracted files to disk.
- Compression – Tar supports optional compression with gzip or bzip2 to save space. The compressed archives have .tar.gz or .tar.bz2 extensions.
- Append – Adding files to an existing archive. Does not affect existing content.
- Concatenation – Combining two archives end-to-end.
Now that we understand tar terminology and basics, let’s move on to some usage examples.
Creating Archives
To create a new tar archive, use the tar -cvf
command. This breaks down as:
-c
– Creates a new archive.-v
– Verbose output. Lists files processed.-f <archive-name>
– Output filename.
For example :
$ tar -cvf archive.tar /path/to/folder
This will archive the given folder recursively into archive.tar
. You can give multiple file/folder paths to add multiple entries:
$ tar -cvf archive.tar /path/one /path/two /path/three
To compress the archive using gzip, use -czvf
instead of just -cvf
:
$ tar -czvf archive.tar.gz /path/to/folder
For bzip2 compression, use -cjvf
:
$ tar -cjvf archive.tar.bz2 /path/to/folder
You can control the verbose output using -v
. Omit it to hide the file listings:
$ tar -cf archive.tar /path/to/folder
Viewing Archive Contents
To view files contained within a tar archive without extracting it, use:
$ tar -tf archive.tar
The -t
lists the contents.
For compressed archives, you need to add the compression flag:
$ tar -tzf archive.tar.gz
$ tar -tjf archive.tar.bz2
Extracting Archives
To extract an archive, use -xf
:
$ tar -xf archive.tar
This will extract the contents of archive.tar
in the current directory while preserving permissions and attributes.
For compressed archives:
$ tar -xzf archive.tar.gz
$ tar -xjf archive.tar.bz2
You can extract to a specific directory using -C
:
$ tar -xf archive.tar -C /tmp/extract-here
This will extract the archive into /tmp/extract-here
.
Appending to Archives
You can append files/directories to an existing tar archive using -rvf
instead of -cvf
:
$ tar -rvf archive.tar /new/folder
This will add /new/folder
recursively to archive.tar
without affecting existing contents.
Updating Archives
To update existing files in an archive or add new files, use -uvf
:
$ tar -uvf archive.tar /path/to/update
This will add any new files under /path/to/update
, and replace any existing files in the archive with the updated versions.
Deleting from Archives
Deleting files from tar archives involves creating a new archive without those files.
First, extract the archive contents to a temporary location:
$ mkdir /tmp/archive-temp
$ tar -xf archive.tar -C /tmp/archive-temp
Then delete the file you want removed from the temporary folder.
Finally, create a new archive from the temporary folder:
$ tar -cf new-archive.tar /tmp/archive-temp
new-archive.tar
will now contain the archive contents minus the deleted file.
Excluding Files/Paths
To exclude certain files/paths when creating an archive, use --exclude
:
$ tar -cvf archive.tar /path --exclude=/path/to/exclude
This will prevent /path/to/exclude
from being added to archive.tar
.
You can have multiple --exclude
options. For example, to exclude all .log files:
$ tar -cvf archive.tar /path --exclude=*.log
Including Only Matched Paths
Instead of excluding certain paths, you can choose to only include matches using -T
:
$ tar -cvf archive.tar -T include-list.txt
Where include-list.txt
containsPatterns like *.py
can be used to only match certain extensions.
Compression Options
By default, tar uses gzip for compression. You can specify different algorithms:
- For gz (gzip):
-z
- For bz2 (bzip2):
-j
- For lzma:
-J
- For lzop:
-Z
For example:
$ tar -cjf archive.tar.bz2 /path # bzip2 compression
You can also set the compression level, which usually ranges from 1 to 9 (higher = better compression but slower):
$ tar -czf -9 archive.tar.gz /path # gzip level 9
Archive Verification
Once an archive is created, you can verify it has not been corrupted or altered using -W
:
$ tar -Wvf archive.tar
This will check the integrity of the archive.
For compressed archives, add the compression flag as usual:
$ tar -Wzvf archive.tar.gz
Tar Over Pipes and Remote Access
Tar can read/write archives locally or remotely via stdin/stdout pipes.
For example, to create a tar over SSH:
$ ssh user@host 'tar -cf - /path/to/archive' | tar -xvf -
This pipes the tar output over SSH to extract locally.
You can also extract an archive and pipe it over SSH for remote extraction:
$ tar -cf - /path/to/archive | ssh user@host 'tar -xvf -'
Piping tar through SSH compressing/decompressing can significantly speed up transfers:
$ tar czf - /path/to/archive | ssh user@host 'tar xvzf -'
These are just some examples – tar gives you a lot of flexibility with pipes.
Splitting/Spanning Archives
If your archive does not fit onto a single volume like a tape drive or disk, you can split tar archives into multiple chunks.
To split by size:
$ tar -cvf - --tape-length=1G /path | split -b 1G - archive.tar.
This will split archive.tar
into 1GB chunks named archive.tar.01
, archive.tar.02
, etc.
You can also split by number of chunks:
$ tar -cvf - /path | split -b 100m -d -a 5 - archive.tar.
This splits the archive into 5 parts (-a 5) named archive.tar.01
, archive.tar.02
, … archive.tar.05
.
To reconstruct the archive from the chunks, use cat
to concatenate them back in order:
$ cat archive.tar.0* > archive.tar
Then extract as usual with tar -xf archive.tar
.
Archiving Special Files
- To archive device files like
/dev/sdb
, use the--preserve-devices
option in GNU tar or--formats=v7
in BSD/Solaris tar. - For tracking file hardlinks accurately and archiving them properly, use
--hard-dereference
. - For archiving system extended attributes (SELinux, ACLs, etc), use
--xattrs
. - To keep empty directories in the archive, use
--keep-directory-symlinks
.
Refer to the tar documentation for details on these and other specialty archiving options.
Useful Tar Flags/Examples
Here is a quick reference of some useful tar flags and operations:
# Create archive
$ tar -cf archive.tar /path/to/files
# Compressed archive
$ tar -czf archive.tar.gz /path/to/files
# View archive contents
$ tar -tf archive.tar
# Extract archive
$ tar -xf archive.tar
# Extract to specific folder
$ tar -xf archive.tar -C /tmp
# Append files to archive
$ tar -rvf archive.tar file1 file2
# Update files in archive
$ tar -uvf archive.tar file1
# Delete file from archive
$ tar --delete -f archive.tar file_to_delete
# Archive a remote folder over SSH
$ ssh user@host 'tar -cf - /path/to/archive' | tar -xvf -
# Verify archive integrity
$ tar -Wvf archive.tar
# Compression levels 1-9
$ tar -czf -9 archive.tar.gz /path
# Split archive into chunks
$ tar -cf - /path | split -b 100m -d -a 5 - archive.tar.
This covers a wide range of tar usage examples. Be sure to refer to the man pages for your specific tar implementation for more details and supported flags.
Conclusion
Tar is an essential tool for working with groups of files on Linux/UNIX systems. It allows you to bundle any number of files, directories, and special files into a single portable archive that preserves permissions and attributes, and has many everyday uses for file backups, transfers, Docker and CI/CD workflows, and software distribution. It is a standardized UNIX utility guaranteed to be available on Linux and macOS systems.
Hopefully this guide gave you a broad overview of tar and how to use it effectively for managing archives on a Linux system like Debian, Ubuntu 18.04 / 20.04 / 22.04, CentOS 7 / 8 or Red Hat 7.
If you have any questions or would like to know more about this article, please post your question in the comments.