.BZ2 Bzip2 Compressed
.bz2

Bzip2 Compressed

BZ2 (Bzip2) is a single-file lossless compression format created by Julian Seward in 1996. It uses the Burrows-Wheeler Transform combined with Huffman coding to achieve better ratios than gzip at slower speeds.

Archive structure
Header magic bytes
Entries compressed files
Index directory
ArchiveBWT + HuffmanBlock-basedCRC-321996
By FileDex
Not convertible

Bzip2 decompression requires the libbz2 library not available in browser WASM.

Common questions

How do I open or inspect a BZ2 file?

Use bunzip2 -k file.bz2 to decompress while keeping the original, or bzcat file.bz2 | head to preview the first lines without full extraction. On Windows, 7-Zip and WinRAR both support BZ2 natively.

Is BZ2 still worth using over GZ or XZ?

For new projects, XZ or Zstandard are generally preferred. BZ2 remains relevant for compatibility with existing .tar.bz2 archives and in cases where bzip2recover block-level recovery is needed.

Can I recover data from a partially corrupted BZ2 file?

Yes. BZ2 processes data in independent blocks with individual CRC-32 checksums. The bzip2recover tool can extract intact blocks from a damaged file, allowing partial data recovery that is not possible with gzip.

What is the difference between bzip2, pbzip2, and lbzip2?

bzip2 is single-threaded. pbzip2 and lbzip2 add multi-threaded parallel compression and decompression. All three produce format-compatible output that any bzip2 decompressor can read.

What makes .BZ2 special

What is a BZ2 file?

BZ2 (Bzip2) is a file compression format using the Burrows-Wheeler block-sorting algorithm combined with Huffman coding, created by Julian Seward in 1996. It typically achieves 10-15% better compression ratios than gzip at the cost of significantly slower compression (and moderately slower decompression), making it well-suited for static distribution archives where compression is a one-time cost but downloads happen many times.

Continue reading — full technical deep dive

Like GZ, BZ2 compresses a single file — for multi-file archives it is paired with TAR as .tar.bz2 (also written .tbz2 or .tbz). Bzip2 was widely used in Linux software distribution throughout the 2000s and 2010s, though XZ compression has largely replaced it for new releases due to even better ratios.

How to open BZ2 files

  • bunzip2 / bzip2 -d (macOS, Linux) — Built-in CLI: bunzip2 file.bz2
  • tar (macOS, Linux) — Extract .tar.bz2 directly: tar -xjf archive.tar.bz2
  • 7-Zip (Windows) — Free, open-source
  • WinRAR (Windows) — Built-in BZ2 support
  • The Unarchiver (macOS) — Free
  • PeaZip (Windows, Linux) — Free alternative

Technical specifications

Property Value
Algorithm Burrows-Wheeler Transform (BWT) + Huffman coding
Block size 100 KB – 900 KB (adjustable, default 900 KB)
Single-file Compresses one file or stream
Checksum CRC-32 per block
Multi-threading Limited (pbzip2 adds parallel support)
Common pairs .tar.bz2, .tar.tbz2
Magic bytes 42 5A 68 (BZh in ASCII)

Common use cases

  • Software distribution: Source code tarballs for Linux packages (.tar.bz2)
  • Linux package archives: Older Arch Linux packages used .pkg.tar.bz2
  • Long-term archival: Better compression means smaller storage footprint over time
  • Data science: Compressed datasets where download size matters more than extraction speed

BZ2 vs GZ vs XZ comparison

Format Compression Speed Typical use
GZ Good Fast Web transfer, real-time pipelines
BZ2 Better Slower Software distribution
XZ Best Slowest Modern Linux packages, source releases
Zstandard Very good Very fast High-performance systems

BZ2 occupies the middle ground — better compression than GZ but faster than XZ. For new projects, XZ or Zstandard are generally preferred. BZ2 remains relevant for compatibility with existing .tar.bz2 archives.

Working with BZ2 on the command line

# Compress a file
bzip2 file.sql

# Compress keeping the original
bzip2 -k file.sql

# Decompress
bunzip2 file.sql.bz2
# or
bzip2 -d file.sql.bz2

# Create .tar.bz2 archive
tar -cjf archive.tar.bz2 /path/to/folder/

# Extract .tar.bz2 archive
tar -xjf archive.tar.bz2

# View without extracting
bzcat file.txt.bz2 | head -20

# Parallel compression (much faster on multi-core)
pbzip2 -p8 largefile.sql   # use 8 threads

Parallel compression with pbzip2

Standard bzip2 is single-threaded, making it slow on large files. The pbzip2 tool parallelizes compression across CPU cores with no change in output format — the resulting file is compatible with standard bunzip2. On modern multi-core systems, pbzip2 -p8 can compress 5-8x faster than single-threaded bzip2. Similarly, lbzip2 offers parallel decompression.

Integrity and error recovery

BZ2 processes data in independent blocks (up to 900 KB each), each with its own CRC-32 checksum. This block structure means that if a BZ2 file is partially corrupted, bzip2recover can extract intact blocks from the undamaged portions — something not possible with GZ's single-stream design. This makes BZ2 slightly more resilient to partial data corruption.

.BZ2 compared to alternatives

.BZ2 compared to alternative formats
Formats Criteria Winner
.BZ2 vs .GZ
Compression ratio
BZ2 typically achieves 10-15% better compression than gzip on text and source code due to the Burrows-Wheeler Transform operating on larger blocks.
BZ2 wins
.BZ2 vs .GZ
Speed
Gzip compresses and decompresses significantly faster than bzip2. Gzip decompression is roughly 4-6x faster, making it preferred for real-time pipelines.
GZ wins
.BZ2 vs .XZ
Compression ratio
XZ (LZMA2) consistently produces smaller output than BZ2 across most data types, while also decompressing faster.
XZ wins

Technical reference

MIME Type
application/x-bzip2
Magic Bytes
42 5A 68 BZh signature followed by block size digit.
Developer
Julian Seward
Year Introduced
1996
Open Standard
Yes
00000000425A68 BZh

BZh signature followed by block size digit.

Binary Structure

A BZ2 file starts with a 4-byte header: the 3-byte magic 42 5A 68 (ASCII 'BZh') followed by a single ASCII digit (1-9) indicating block size in units of 100 KB. The stream then contains one or more compressed blocks. Each block begins with the 6-byte pi marker 0x314159265359, followed by a 32-bit CRC of the original data in that block, a 1-bit randomisation flag, the BWT pointer, symbol maps, Huffman trees, and the BWT-compressed data. The stream ends with a 6-byte end-of-stream marker 0x177245385090 followed by a 32-bit combined CRC of all blocks.

OffsetLengthFieldExampleDescription
0x00 3 bytes Magic Number 42 5A 68 (BZh) Identifies the file as a bzip2 compressed stream.
0x03 1 byte Block Size 39 (ASCII '9' = 900 KB) ASCII digit 1-9 indicating block size in 100 KB increments. Default is 9 (900 KB).
0x04 6 bytes Block Magic (per block) 31 41 59 26 53 59 Pi marker (0x314159265359) indicating the start of a compressed block.
0x0A 4 bytes Block CRC-32 varies CRC-32 checksum of the original uncompressed data in this block.
EOF-10 6 bytes End-of-Stream Marker 17 72 45 38 50 90 Square root of pi marker (0x177245385090) signalling end of the bzip2 stream.
EOF-4 4 bytes Stream CRC-32 varies Combined CRC-32 of all block CRCs, used for full-stream integrity verification.
1996Julian Seward releases bzip2 0.15 as a BWT-based compressor1996Version 0.21 introduces the current .bz2 stream format with block CRCs2000bzip2 1.0 released; format stabilized and widely adopted by Linux distributions2003pbzip2 released, adding multi-threaded parallel compression to bzip22010bzip2 1.0.6 released (latest stable as of 2024); maintained by Federico Mena-Quintero since 20192019XZ and Zstandard displace BZ2 for new Linux package distribution; BZ2 enters maintenance mode
Compress a file with bzip2 other
bzip2 -k file.txt

-k keeps the original file. Output is file.txt.bz2 using default block size 900 KB.

Decompress a .bz2 file other
bunzip2 -k file.txt.bz2

Decompresses file.txt.bz2 to file.txt. -k preserves the compressed original.

Parallel compress with pbzip2 other
pbzip2 -p8 -k largefile.sql

Compresses using 8 CPU threads. Output is format-compatible with standard bunzip2. 5-8x faster on multi-core systems.

Extract a .tar.bz2 archive other
tar -xjf archive.tar.bz2

-j flag tells tar to pipe through bzip2 for decompression. Extracts all files to the current directory.

Recover intact blocks from a corrupted .bz2 other
bzip2recover damaged.bz2

Splits a corrupted bz2 file into individual block files (rec00001.bz2, rec00002.bz2, ...). Intact blocks can be individually decompressed.

BZ2 GZ transcode lossless Decompress bzip2 and recompress with gzip for compatibility with tools expecting gzip input. Gzip is faster to decompress and more universally supported in streaming pipelines.
BZ2 XZ transcode lossless XZ (LZMA2) achieves better compression than BZ2 with faster decompression. Ideal for modernizing legacy .tar.bz2 archives.
BZ2 TAR export lossless Extract the inner tar archive from a .tar.bz2 bundle, producing an uncompressed .tar for direct file access or recompression with a different algorithm.
LOW

Attack Vectors

  • Decompression bomb: crafted BZ2 with extremely high compression ratio can exhaust memory during extraction
  • Buffer overflow in older bzip2 versions (CVE-2019-12900) when handling malformed block headers

Mitigation: FileDex does not decompress, parse, or execute BZ2 content. Reference page only — no server-side processing.

bzip2 tool
Reference compressor and decompressor by Julian Seward
pbzip2 tool
Multi-threaded parallel bzip2 compressor for multi-core systems
lbzip2 tool
Parallel bzip2 with faster decompression than pbzip2
7-Zip tool
Free cross-platform archiver with bzip2 compression support
bzip2recover tool
Block-level recovery tool for corrupted bz2 files, included with bzip2
Apache Commons Compress library
Java library for reading and writing bzip2, gzip, xz, and other formats