What is a BZ2 file?
BZ2 (Bzip2) is a file compression format using the Burrows-Wheeler block-sorting algorithm combined with Huffman coding, created by Julian Seward in 1996. It typically achieves 10-15% better compression ratios than gzip at the cost of significantly slower compression (and moderately slower decompression), making it well-suited for static distribution archives where compression is a one-time cost but downloads happen many times.
Like GZ, BZ2 compresses a single file — for multi-file archives it is paired with TAR as .tar.bz2 (also written .tbz2 or .tbz). Bzip2 was widely used in Linux software distribution throughout the 2000s and 2010s, though XZ compression has largely replaced it for new releases due to even better ratios.
How to open BZ2 files
- bunzip2 / bzip2 -d (macOS, Linux) — Built-in CLI:
bunzip2 file.bz2 - tar (macOS, Linux) — Extract
.tar.bz2directly:tar -xjf archive.tar.bz2 - 7-Zip (Windows) — Free, open-source
- WinRAR (Windows) — Built-in BZ2 support
- The Unarchiver (macOS) — Free
- PeaZip (Windows, Linux) — Free alternative
Technical specifications
| Property | Value |
|---|---|
| Algorithm | Burrows-Wheeler Transform (BWT) + Huffman coding |
| Block size | 100 KB – 900 KB (adjustable, default 900 KB) |
| Single-file | Compresses one file or stream |
| Checksum | CRC-32 per block |
| Multi-threading | Limited (pbzip2 adds parallel support) |
| Common pairs | .tar.bz2, .tar.tbz2 |
| Magic bytes | 42 5A 68 (BZh in ASCII) |
Common use cases
- Software distribution: Source code tarballs for Linux packages (
.tar.bz2) - Linux package archives: Older Arch Linux packages used
.pkg.tar.bz2 - Long-term archival: Better compression means smaller storage footprint over time
- Data science: Compressed datasets where download size matters more than extraction speed
BZ2 vs GZ vs XZ comparison
| Format | Compression | Speed | Typical use |
|---|---|---|---|
| GZ | Good | Fast | Web transfer, real-time pipelines |
| BZ2 | Better | Slower | Software distribution |
| XZ | Best | Slowest | Modern Linux packages, source releases |
| Zstandard | Very good | Very fast | High-performance systems |
BZ2 occupies the middle ground — better compression than GZ but faster than XZ. For new projects, XZ or Zstandard are generally preferred. BZ2 remains relevant for compatibility with existing .tar.bz2 archives.
Working with BZ2 on the command line
# Compress a file
bzip2 file.sql
# Compress keeping the original
bzip2 -k file.sql
# Decompress
bunzip2 file.sql.bz2
# or
bzip2 -d file.sql.bz2
# Create .tar.bz2 archive
tar -cjf archive.tar.bz2 /path/to/folder/
# Extract .tar.bz2 archive
tar -xjf archive.tar.bz2
# View without extracting
bzcat file.txt.bz2 | head -20
# Parallel compression (much faster on multi-core)
pbzip2 -p8 largefile.sql # use 8 threads
Parallel compression with pbzip2
Standard bzip2 is single-threaded, making it slow on large files. The pbzip2 tool parallelizes compression across CPU cores with no change in output format — the resulting file is compatible with standard bunzip2. On modern multi-core systems, pbzip2 -p8 can compress 5-8x faster than single-threaded bzip2. Similarly, lbzip2 offers parallel decompression.
Integrity and error recovery
BZ2 processes data in independent blocks (up to 900 KB each), each with its own CRC-32 checksum. This block structure means that if a BZ2 file is partially corrupted, bzip2recover can extract intact blocks from the undamaged portions — something not possible with GZ’s single-stream design. This makes BZ2 slightly more resilient to partial data corruption.