Tape Archive
TAR (Tape Archive) bundles multiple files into a single archive preserving Unix permissions, ownership, and symlinks — without applying any compression. Pair with gzip, bzip2, or xz for compressed tarballs (.tar.gz, .tar.bz2, .tar.xz).
Tar archiving without recompression is possible but cross-format archive conversion is not available in browser WASM.
Common questions
What is the difference between .tar and .tar.gz?
A .tar file is an uncompressed archive that bundles files while preserving Unix permissions. A .tar.gz file is a tar archive compressed with gzip, typically 60-80% smaller. TAR handles archiving; gzip handles compression. The two-step design lets you choose any compression algorithm independently.
Can Windows open TAR files natively?
Windows 10 (build 17063+) and Windows 11 include tar.exe in the command line. For GUI extraction, use 7-Zip (free) or WinRAR. Native File Explorer support for TAR was added alongside 7z and gz in the Windows 11 23H2 update.
Why does TAR not compress files?
TAR was designed in 1979 for sequential magnetic tape storage, where separating archiving from compression was architecturally necessary. Tapes are write-once sequential media — compression would prevent the tape drive from appending files. The separation also allows choosing the best compression algorithm (gzip, bzip2, xz, zstd) for the data type.
How do I extract a single file from a large tar.gz archive?
Run tar xzf archive.tar.gz path/to/specific/file.txt with the exact path as listed by tar tzf archive.tar.gz. Gzip archives must decompress sequentially from the start, so extracting one file still reads through the compressed stream up to that point.
What makes .TAR special
What is a TAR file?
TAR (Tape Archive) is a Unix/Linux file archival format that combines multiple files and directories into a single file, preserving file permissions, ownership, timestamps, and symlinks. Originally designed for sequential magnetic tape storage at AT&T Bell Labs in the 1970s, TAR has become the standard packaging format for Unix software distribution and Linux system administration.
Continue reading — full technical deep dive
TAR itself performs no compression — it only bundles files. Compression is applied separately by piping through gzip, bzip2, or xz, producing the familiar .tar.gz, .tar.bz2, and .tar.xz formats (also called "tarballs"). The two-step design gives flexibility: you can choose the compression algorithm independently of the archive structure.
How to open TAR files
- 7-Zip (Windows) — Free, supports all TAR variants
- WinRAR (Windows) — Built-in TAR support
- tar (macOS, Linux) — Built-in CLI tool:
tar -xf archive.tar - The Unarchiver (macOS) — Free, handles
.tar,.tar.gz,.tar.bz2 - PeaZip (Windows, Linux) — Free, open-source
Technical specifications
| Property | Value |
|---|---|
| Compression | None (archiving only — use with gzip/bzip2/xz) |
| Permissions | Preserves Unix file permissions (chmod bits) |
| Ownership | Preserves user/group ownership (UID/GID) |
| Symlinks | Fully supported |
| Max path length | 256 characters (GNU TAR extends with POSIX extensions) |
| Magic bytes | 75 73 74 61 72 (ustar at offset 257) |
| Common pairs | .tar.gz / .tgz, .tar.bz2 / .tbz2, .tar.xz / .txz |
Common use cases
- Linux software distribution: Source code releases as
.tar.gztarballs - System backup: Preserves file permissions and ownership, unlike ZIP
- Docker images: Container filesystem layers are stored as TAR archives
- Deployment packages: Application code bundled for Linux server deployment
- Data transfer between Unix systems: Reliable, lossless file bundling with metadata
Essential TAR commands
# Create a .tar.gz archive (compress with gzip)
tar -czf archive.tar.gz /path/to/folder/
# Create a .tar.bz2 archive (compress with bzip2)
tar -cjf archive.tar.bz2 /path/to/folder/
# Create a .tar.xz archive (compress with xz — best ratio)
tar -cJf archive.tar.xz /path/to/folder/
# List contents without extracting
tar -tzf archive.tar.gz
# Extract to current directory
tar -xzf archive.tar.gz
# Extract to specific directory
tar -xzf archive.tar.gz -C /target/directory/
# Extract a single file
tar -xzf archive.tar.gz path/to/specific/file.txt
The flags: -c create, -x extract, -t list, -z gzip, -j bzip2, -J xz, -f file (must be last before filename), -v verbose.
TAR vs ZIP
| Feature | TAR | ZIP |
|---|---|---|
| Compression | External (separate step) | Built-in per file |
| Unix permissions | ✅ Preserved | ❌ Mostly lost |
| Symlinks | ✅ | Limited |
| Native Windows support | ❌ (needs 7-Zip) | ✅ Built-in |
| Random access | ❌ (sequential) | ✅ (central directory) |
| Streaming | ✅ (pipe-friendly) | Limited |
TAR is the right choice for Unix/Linux systems and any scenario where file permissions and ownership matter. ZIP is the better choice for cross-platform sharing with Windows users.
Incremental backups with TAR
GNU TAR supports incremental (snapshot) backups:
# Full backup with snapshot file
tar -czf backup-full.tar.gz --listed-incremental=snapshot.file /data/
# Incremental backup (only changed files since last snapshot)
tar -czf backup-incr.tar.gz --listed-incremental=snapshot.file /data/
This is widely used in cron-based backup scripts as a lightweight alternative to dedicated backup software.
TAR and Docker
Docker stores image layers as TAR archives internally. The docker save and docker export commands produce .tar files. Understanding TAR helps when working with container registries, inspecting image contents, or building minimal base images from scratch using tar -c . | docker import - myimage.
.TAR compared to alternatives
| Formats | Criteria | Winner |
|---|---|---|
| .TAR vs .ZIP | Unix metadata preservation TAR preserves Unix file permissions (chmod bits), owner/group UID/GID, symlinks, and device nodes. ZIP discards most Unix-specific metadata. | TAR wins |
| .TAR vs .ZIP | Random access ZIP has a central directory at the end of the file, allowing extraction of individual files without scanning the entire archive. TAR is sequential — extracting the last file requires reading through all preceding entries. | ZIP wins |
| .TAR vs .ZIP | Streaming / pipe support TAR can be created and consumed through Unix pipes (e.g., tar cf - . | ssh remote tar xf -). ZIP requires random access to the central directory, making pipe-based creation unreliable. | TAR wins |
| .TAR.GZ vs .TAR.XZ | Compression ratio XZ (LZMA2) achieves 10-30% smaller archives than gzip (DEFLATE) on source code and text. Gzip is 3-5x faster for both compression and decompression. | TAR.XZ wins |
Technical reference
- MIME Type
application/x-tar- Magic Bytes
75 73 74 61 72ustar magic at offset 257 in POSIX tar.- Developer
- AT&T Unix
- Year Introduced
- 1979
- Open Standard
- Yes
ustar magic at offset 257 in POSIX tar.
Binary Structure
A TAR file is a sequence of 512-byte blocks. Each archived file begins with a 512-byte header block followed by the file data padded to a 512-byte boundary. The header contains the filename (bytes 0-99, null-terminated), file mode (bytes 100-107, octal ASCII), owner/group UID/GID (bytes 108-115 and 116-123, octal ASCII), file size (bytes 124-135, octal ASCII), modification time (bytes 136-147, octal ASCII Unix timestamp), header checksum (bytes 148-155, octal ASCII), type flag (byte 156: '0'=regular file, '5'=directory, '2'=symlink), and link name (bytes 157-256). The POSIX/UStar magic signature 'ustar' appears at offset 257-261 (hex: 75 73 74 61 72), followed by a version field at 263-264. The extended UStar fields include owner name (bytes 265-296), group name (bytes 297-328), device major/minor (bytes 329-336 and 337-344), and a filename prefix (bytes 345-499) that extends the 100-byte filename limit to 256 characters. Two consecutive 512-byte blocks of all zeros mark the end of the archive. There is NO magic number at offset 0 — bytes 0-99 contain the first file's pathname.
| Offset | Length | Field | Example | Description |
|---|---|---|---|---|
0x00 | 100 bytes | File Name | src/main.c (null-padded) | Null-terminated ASCII filename. Limited to 100 bytes in original tar; UStar prefix field extends this to 256. |
0x64 | 8 bytes | File Mode | 30 31 30 30 36 34 34 00 (0100644) | Octal ASCII file permissions including file type bits. e.g., 0100644 = regular file, owner rw, group/other r. |
0x6C | 8 bytes | Owner UID | 30 30 30 31 30 30 30 00 (0001000) | Octal ASCII numeric user ID of the file owner. |
0x74 | 8 bytes | Group GID | 30 30 30 31 30 30 30 00 (0001000) | Octal ASCII numeric group ID. |
0x7C | 12 bytes | File Size | 30 30 30 30 30 30 30 34 32 36 34 00 (00000004264) | Octal ASCII file size in bytes. Directories and symlinks have size 0. |
0x88 | 12 bytes | Modification Time | 31 34 37 31 33 30 35 36 37 32 00 00 (14713056720) | Octal ASCII Unix timestamp (seconds since 1970-01-01). |
0x94 | 8 bytes | Header Checksum | 30 31 32 33 34 35 00 20 (012345) | Octal ASCII sum of all header bytes (checksum field treated as spaces during calculation). |
0x9C | 1 byte | Type Flag | 30 (ASCII '0' = regular file) | '0'=regular file, '2'=symlink, '5'=directory, 'L'=GNU long name, 'x'=PAX extended header. |
0x101 | 5 bytes | UStar Magic | 75 73 74 61 72 (ustar) | POSIX UStar format indicator at offset 257. GNU tar variant uses 'ustar \0' (with trailing spaces). Absent in very old V7 tar archives. |
0x107 | 2 bytes | UStar Version | 30 30 (00) | UStar version number. POSIX.1-2001 uses '00'. |
Attack Vectors
- Path traversal: crafted tar entries with ../../ prefix can write files outside the extraction directory, overwriting system files or planting executables in PATH directories
- Symlink attacks: a tar archive can contain a symlink pointing to /etc/passwd followed by a regular file with the same name — naive extractors follow the symlink and overwrite the target
- Tar bomb: an archive without a top-level directory extracts hundreds of files directly into the current directory, polluting the working directory and potentially overwriting existing files
Mitigation: FileDex does not extract TAR archives server-side. All format analysis is reference-only. When extracting untrusted tar files locally, always use --one-top-level (GNU tar 1.28+) to force extraction into a single subdirectory, and use --no-same-owner to prevent UID/GID spoofing.