Bzip2 Compressed
BZ2 (Bzip2) is a single-file lossless compression format created by Julian Seward in 1996. It uses the Burrows-Wheeler Transform combined with Huffman coding to achieve better ratios than gzip at slower speeds.
Bzip2 decompression requires the libbz2 library not available in browser WASM.
Common questions
How do I open or inspect a BZ2 file?
Use bunzip2 -k file.bz2 to decompress while keeping the original, or bzcat file.bz2 | head to preview the first lines without full extraction. On Windows, 7-Zip and WinRAR both support BZ2 natively.
Is BZ2 still worth using over GZ or XZ?
For new projects, XZ or Zstandard are generally preferred. BZ2 remains relevant for compatibility with existing .tar.bz2 archives and in cases where bzip2recover block-level recovery is needed.
Can I recover data from a partially corrupted BZ2 file?
Yes. BZ2 processes data in independent blocks with individual CRC-32 checksums. The bzip2recover tool can extract intact blocks from a damaged file, allowing partial data recovery that is not possible with gzip.
What is the difference between bzip2, pbzip2, and lbzip2?
bzip2 is single-threaded. pbzip2 and lbzip2 add multi-threaded parallel compression and decompression. All three produce format-compatible output that any bzip2 decompressor can read.
What makes .BZ2 special
What is a BZ2 file?
BZ2 (Bzip2) is a file compression format using the Burrows-Wheeler block-sorting algorithm combined with Huffman coding, created by Julian Seward in 1996. It typically achieves 10-15% better compression ratios than gzip at the cost of significantly slower compression (and moderately slower decompression), making it well-suited for static distribution archives where compression is a one-time cost but downloads happen many times.
Continue reading — full technical deep dive
Like GZ, BZ2 compresses a single file — for multi-file archives it is paired with TAR as .tar.bz2 (also written .tbz2 or .tbz). Bzip2 was widely used in Linux software distribution throughout the 2000s and 2010s, though XZ compression has largely replaced it for new releases due to even better ratios.
How to open BZ2 files
- bunzip2 / bzip2 -d (macOS, Linux) — Built-in CLI:
bunzip2 file.bz2 - tar (macOS, Linux) — Extract
.tar.bz2directly:tar -xjf archive.tar.bz2 - 7-Zip (Windows) — Free, open-source
- WinRAR (Windows) — Built-in BZ2 support
- The Unarchiver (macOS) — Free
- PeaZip (Windows, Linux) — Free alternative
Technical specifications
| Property | Value |
|---|---|
| Algorithm | Burrows-Wheeler Transform (BWT) + Huffman coding |
| Block size | 100 KB – 900 KB (adjustable, default 900 KB) |
| Single-file | Compresses one file or stream |
| Checksum | CRC-32 per block |
| Multi-threading | Limited (pbzip2 adds parallel support) |
| Common pairs | .tar.bz2, .tar.tbz2 |
| Magic bytes | 42 5A 68 (BZh in ASCII) |
Common use cases
- Software distribution: Source code tarballs for Linux packages (
.tar.bz2) - Linux package archives: Older Arch Linux packages used
.pkg.tar.bz2 - Long-term archival: Better compression means smaller storage footprint over time
- Data science: Compressed datasets where download size matters more than extraction speed
BZ2 vs GZ vs XZ comparison
| Format | Compression | Speed | Typical use |
|---|---|---|---|
| GZ | Good | Fast | Web transfer, real-time pipelines |
| BZ2 | Better | Slower | Software distribution |
| XZ | Best | Slowest | Modern Linux packages, source releases |
| Zstandard | Very good | Very fast | High-performance systems |
BZ2 occupies the middle ground — better compression than GZ but faster than XZ. For new projects, XZ or Zstandard are generally preferred. BZ2 remains relevant for compatibility with existing .tar.bz2 archives.
Working with BZ2 on the command line
# Compress a file
bzip2 file.sql
# Compress keeping the original
bzip2 -k file.sql
# Decompress
bunzip2 file.sql.bz2
# or
bzip2 -d file.sql.bz2
# Create .tar.bz2 archive
tar -cjf archive.tar.bz2 /path/to/folder/
# Extract .tar.bz2 archive
tar -xjf archive.tar.bz2
# View without extracting
bzcat file.txt.bz2 | head -20
# Parallel compression (much faster on multi-core)
pbzip2 -p8 largefile.sql # use 8 threads
Parallel compression with pbzip2
Standard bzip2 is single-threaded, making it slow on large files. The pbzip2 tool parallelizes compression across CPU cores with no change in output format — the resulting file is compatible with standard bunzip2. On modern multi-core systems, pbzip2 -p8 can compress 5-8x faster than single-threaded bzip2. Similarly, lbzip2 offers parallel decompression.
Integrity and error recovery
BZ2 processes data in independent blocks (up to 900 KB each), each with its own CRC-32 checksum. This block structure means that if a BZ2 file is partially corrupted, bzip2recover can extract intact blocks from the undamaged portions — something not possible with GZ's single-stream design. This makes BZ2 slightly more resilient to partial data corruption.
.BZ2 compared to alternatives
| Formats | Criteria | Winner |
|---|---|---|
| .BZ2 vs .GZ | Compression ratio BZ2 typically achieves 10-15% better compression than gzip on text and source code due to the Burrows-Wheeler Transform operating on larger blocks. | BZ2 wins |
| .BZ2 vs .GZ | Speed Gzip compresses and decompresses significantly faster than bzip2. Gzip decompression is roughly 4-6x faster, making it preferred for real-time pipelines. | GZ wins |
| .BZ2 vs .XZ | Compression ratio XZ (LZMA2) consistently produces smaller output than BZ2 across most data types, while also decompressing faster. | XZ wins |
Technical reference
- MIME Type
application/x-bzip2- Magic Bytes
42 5A 68BZh signature followed by block size digit.- Developer
- Julian Seward
- Year Introduced
- 1996
- Open Standard
- Yes
BZh signature followed by block size digit.
Binary Structure
A BZ2 file starts with a 4-byte header: the 3-byte magic 42 5A 68 (ASCII 'BZh') followed by a single ASCII digit (1-9) indicating block size in units of 100 KB. The stream then contains one or more compressed blocks. Each block begins with the 6-byte pi marker 0x314159265359, followed by a 32-bit CRC of the original data in that block, a 1-bit randomisation flag, the BWT pointer, symbol maps, Huffman trees, and the BWT-compressed data. The stream ends with a 6-byte end-of-stream marker 0x177245385090 followed by a 32-bit combined CRC of all blocks.
| Offset | Length | Field | Example | Description |
|---|---|---|---|---|
0x00 | 3 bytes | Magic Number | 42 5A 68 (BZh) | Identifies the file as a bzip2 compressed stream. |
0x03 | 1 byte | Block Size | 39 (ASCII '9' = 900 KB) | ASCII digit 1-9 indicating block size in 100 KB increments. Default is 9 (900 KB). |
0x04 | 6 bytes | Block Magic (per block) | 31 41 59 26 53 59 | Pi marker (0x314159265359) indicating the start of a compressed block. |
0x0A | 4 bytes | Block CRC-32 | varies | CRC-32 checksum of the original uncompressed data in this block. |
EOF-10 | 6 bytes | End-of-Stream Marker | 17 72 45 38 50 90 | Square root of pi marker (0x177245385090) signalling end of the bzip2 stream. |
EOF-4 | 4 bytes | Stream CRC-32 | varies | Combined CRC-32 of all block CRCs, used for full-stream integrity verification. |
Attack Vectors
- Decompression bomb: crafted BZ2 with extremely high compression ratio can exhaust memory during extraction
- Buffer overflow in older bzip2 versions (CVE-2019-12900) when handling malformed block headers
Mitigation: FileDex does not decompress, parse, or execute BZ2 content. Reference page only — no server-side processing.