MP3 Audio
Invented at Fraunhofer IIS in Erlangen using Suzanne Vega's "Tom's Diner" as the test track, MP3 became patent-free in 2017. Convert MP3 to WAV, FLAC, OGG, or AAC in your browser — no upload, no server. FileDex decodes and re-encodes locally via FFmpeg WebAssembly.
Your files never leave your device
Common questions
Does converting MP3 to WAV improve audio quality?
No. Decoding MP3 to WAV does not restore audio data discarded during MP3 encoding. The WAV file is larger but contains identical audio fidelity. WAV conversion is useful for compatibility with audio editors that require uncompressed PCM input.
What is the best MP3 bitrate for music?
192 kbps VBR using the LAME encoder is widely considered transparent — indistinguishable from CD audio in double-blind tests for most listeners. 320 kbps CBR is the maximum and is used for archival delivery. Below 96 kbps, frequency masking artifacts become audible on high-frequency content.
Can I convert MP3 to FLAC without losing quality?
The conversion itself is lossless — FLAC perfectly preserves the decoded MP3 audio. However, MP3 is already lossy, so the FLAC file will not recover original quality discarded during MP3 encoding. The result is a lossless container around lossy audio data.
Why does my MP3 have a gap or click between tracks?
MP3 frames are fixed at 1,152 samples, so encoder delay and padding create silence at track boundaries. LAME writes an Info/Xing header with exact sample counts for gapless-capable players. Players that ignore this header insert brief silence or clicks.
Are MP3 files still patent-encumbered?
No. Fraunhofer IIS terminated the MP3 patent licensing program in April 2017. All key patents have expired globally. MP3 encoding and decoding is now fully patent-free for any use case.
What is the difference between CBR, VBR, and ABR?
CBR (Constant Bit Rate) uses the same bitrate for every frame — predictable file sizes, required by some hardware decoders. VBR (Variable Bit Rate) allocates more bits to complex passages and fewer to silence — better quality per kilobyte. ABR (Average Bit Rate) targets an average bitrate across the file — a middle ground. LAME -V 0 (VBR quality 0) typically outperforms CBR 320kbps at smaller file sizes.
What programs create and edit ID3 tags on MP3 files?
Mp3tag (Windows, macOS) is the most popular dedicated tag editor. MusicBrainz Picard uses acoustic fingerprinting to automatically tag files from the MusicBrainz database. foobar2000 and VLC also support tag editing. Command-line: id3v2 -t 'Title' -a 'Artist' input.mp3
What makes .MP3 special
The MPEG-1 Audio Layer III codec compresses audio by exploiting the human ear's inability to hear certain frequencies when louder sounds are present nearby. This psychoacoustic model is the engine behind every MP3 file — it decides what to keep and what to discard, producing files roughly 10x smaller than uncompressed PCM at acceptable quality.
Continue reading — full technical deep dive
The psychoacoustic model
Two masking phenomena drive MP3's compression. Frequency masking (simultaneous masking) occurs when a loud tone renders nearby quieter tones inaudible — a 1 kHz tone at 80 dB masks everything within a critical band around it below roughly 60 dB. Temporal masking suppresses perception for 5–20 ms after a loud transient and 2–5 ms before one (pre-masking). The encoder calculates masking thresholds across 32 subbands for every frame, then allocates bits only to signal components that exceed those thresholds.
Encoding pipeline
MP3 encoding proceeds through four stages:
- Polyphase subband filter — splits the input PCM into 32 equal-width subbands (each 625 Hz wide at 44.1 kHz sample rate)
- MDCT (Modified Discrete Cosine Transform) — transforms each subband into frequency-domain coefficients with finer resolution (576 frequency lines per granule)
- Quantization — scales coefficients based on the psychoacoustic model's masking thresholds, discarding inaudible detail to meet the target bitrate
- Huffman coding — entropy-encodes the quantized values using lookup tables optimized for typical audio spectral shapes
The encoder iterates quantization parameters (scale factors and global gain) in an inner/outer loop until distortion stays below masking thresholds while hitting the bitrate target. This iterative process is why MP3 encoding is asymmetric — decoding is a single-pass operation roughly 10x faster.
Bitrate tiers and frequency cutoffs
At 128 kbps, MP3 encoding discards frequencies above approximately 16 kHz; at 320 kbps the cutoff reaches roughly 20 kHz. Between these extremes, each step trades bandwidth for fidelity:
| Bitrate | Frequency ceiling | Typical use | Stereo file size/min |
|---|---|---|---|
| 128 kbps | ~16 kHz | Podcasts, voice, casual listening | 0.94 MB |
| 192 kbps | ~18 kHz | General music streaming | 1.41 MB |
| 256 kbps | ~19.5 kHz | High-quality streaming | 1.88 MB |
| 320 kbps | ~20 kHz | Archival, critical listening | 2.34 MB |
The 16 kHz cutoff at 128 kbps explains why cymbals and sibilants sound dull at that rate — high-frequency harmonic content simply isn't encoded. Most adults over 25 cannot hear above 16 kHz anyway, which is why 128 kbps podcasts sound acceptable for speech.
VBR, CBR, and ABR modes
Constant Bitrate (CBR) assigns the same number of bits to every frame regardless of complexity. Silence wastes bits; dense orchestral passages starve. CBR's advantage is predictable file size and guaranteed compatibility with older hardware players.
Variable Bitrate (VBR) lets the encoder allocate bits per frame based on signal complexity. LAME's VBR quality scale (V0 through V9) targets perceptual quality rather than a fixed rate. V0 averages around 245 kbps and is considered transparent for most material. V5 averages ~130 kbps. VBR produces better quality-per-byte than CBR in every case.
Average Bitrate (ABR) is a hybrid — it varies per-frame like VBR but constrains the running average to a target. Useful when you need approximately predictable file sizes without CBR's quality compromises.
Frame structure
An MP3 file is a sequence of independent frames. Each frame contains:
- Sync word — 12 bits of all 1s (
0xFFF) marking the frame start - Header — 20 bits encoding MPEG version, layer, bitrate index, sample rate, padding, channel mode
- Side information — 17 bytes (stereo) or 9 bytes (mono) specifying scale factor partitioning and Huffman table selections
- Main data — the Huffman-coded spectral coefficients, potentially borrowing bytes from previous frames (bit reservoir)
The bit reservoir is a critical mechanism. Frames encoding simple passages may not need their full byte allocation, so unused bytes carry forward for complex passages to borrow. This means a single frame's encoded data can physically reside in a previous frame's byte range — which is why cutting MP3 files on arbitrary frame boundaries can corrupt audio.
ID3 tags and metadata
ID3v1 appends a fixed 128-byte block at the file's end. It's limited: 30 characters per field, a genre byte index, no album art. ID3v2 prepends a variable-length header before the first audio frame, supporting:
- APIC — embedded album artwork (JPEG or PNG, no size limit in spec, but players struggle above 500 KB)
- USLT — unsynchronized lyrics with language code
- CHAP — chapter markers with start/end timestamps and optional embedded images
- TXXX — arbitrary key-value text pairs for custom metadata
ID3v2.4 supports UTF-8 natively. ID3v2.3 (more widely supported) defaults to UTF-16 for non-Latin text.
The gapless playback problem
MP3 frames always contain 1152 samples. When the source audio length isn't a perfect multiple of 1152 samples, the encoder pads the final frame with silence. Additionally, the encoder introduces a priming delay (typically 576 samples for LAME) at the start. This padding creates audible gaps between consecutive tracks — a problem for live albums, classical suites, and DJ mixes.
LAME solves this by writing delay and padding values into a Xing/LAME header stored in the first frame. Decoders that read this header (iTunes, foobar2000, mpv) trim the padding for seamless playback. Decoders that ignore it (many car stereos, cheap Bluetooth speakers) insert a brief silence between tracks.
Joint stereo
MP3 supports four channel modes: stereo, joint stereo, dual channel, and mono. Joint stereo is the default in most encoders because it exploits inter-channel redundancy. It operates in two sub-modes:
Mid/Side stereo encodes the sum (L+R) and difference (L-R) channels instead of left and right independently. When both channels carry similar content (centered vocals, for instance), the difference signal is near-zero and compresses efficiently, freeing bits for the mid channel.
Intensity stereo (used only at low bitrates) preserves the spectral envelope of high-frequency content while encoding only the combined energy, relying on the ear's poor spatial resolution above ~2 kHz. At 128 kbps and above, LAME disables intensity stereo and uses only mid/side.
Limitations
Generation loss compounds. Each decode-edit-reencode cycle applies quantization noise. After 3–4 generations at 128 kbps, artifacts become obvious. Always edit from the source WAV or FLAC, not from an MP3.
No multichannel support. The MPEG-1 spec defines only mono and stereo. 5.1 surround requires MPEG-2 Layer III extensions (the .mp3 extension is technically overloaded), and virtually no consumer player supports it.
No lossless mode. MP3 is inherently lossy. For archival, FLAC or ALAC preserve every sample while still compressing 40–60%.
No native ReplayGain. Volume normalization relies on non-standard tags (ID3v2 TXXX fields or APEv2 tags). Not all players honor them.
When to choose MP3 over alternatives
MP3 wins on one axis: universal decode support. Every device manufactured in the last 20 years plays MP3 natively. If your audience includes firmware-limited hardware (car head units, elevator speakers, embedded PA systems), MP3 at V0 or 320 kbps is the safe choice.
For everything else, alternatives outperform it. AAC-LC at 128 kbps matches MP3 at 192 kbps in listening tests. Opus at 96 kbps rivals MP3 at 256 kbps, especially for speech. FLAC provides lossless compression at roughly 60% of WAV size. MP3's technical ceiling was set in 1993 — newer codecs have three decades of psychoacoustic research built on top of it.
.MP3 compared to alternatives
| Formats | Criteria | Winner |
|---|---|---|
| .MP3 vs .AAC | Audio quality at 128 kbps AAC's improved MDCT windowing and stereo coding produce 20-30% better perceived quality than MP3 at equivalent bitrates, particularly below 128 kbps where MP3's subband filter bank introduces audible pre-echo artifacts. | AAC wins |
| .MP3 vs .FLAC | Audio fidelity FLAC is lossless — bit-perfect reproduction of the original audio. MP3 discards masked frequencies and applies lossy quantization. FLAC files are 4-5x larger per minute of audio. | FLAC wins |
| .MP3 vs .OGG VORBIS | Hardware compatibility MP3 plays on every audio device manufactured since 2000, including car stereos, DAPs, and budget Bluetooth speakers. OGG Vorbis support is limited to software players and some modern hardware. | MP3 wins |
| .MP3 vs .OPUS | Compression efficiency Opus outperforms MP3 at all bitrates, achieving transparent quality at 96-128 kbps where MP3 requires 192-256 kbps. Opus also handles voice and music in a single codec with adaptive switching. | OPUS wins |
Convert .MP3 to...
Technical reference
- MIME Type
audio/mpeg- Magic Bytes
FF FBFrame sync. Also FF F3, FF F2. Files with ID3 tag start with 49 44 33.- Developer
- Fraunhofer Society / ISO
- Year Introduced
- 1993
- Open Standard
- Yes — View specification
Frame sync. Also FF F3, FF F2. Files with ID3 tag start with 49 44 33.
Binary Structure
MP3 is a frame-based format with no global header or container index. Files optionally begin with an ID3v2 tag block (magic: 49 44 33 / 'ID3'), followed by a sequence of independent audio frames. Each frame starts with a 4-byte header containing a 12-bit sync word (0xFFF), MPEG version, layer, protection bit, bitrate index, sample rate index, padding, channel mode, and mode extension. Frame payloads contain Huffman-coded MDCT coefficients. An optional Xing/Info header in the first audio frame stores VBR metadata (total frames, total bytes, seek table) for duration calculation and seeking. ID3v1 tags (128 bytes, magic: 54 41 47 / 'TAG') may appear at the file tail.
| Offset | Length | Field | Example | Description |
|---|---|---|---|---|
0x00 | 3 bytes | ID3v2 Magic | 49 44 33 (ID3) | Present only if file has ID3v2 tags. If absent, audio frames begin at byte 0. |
0x03 | 1 byte | ID3v2 Version | 04 (ID3v2.4) | Major version: 03 = ID3v2.3, 04 = ID3v2.4. ID3v2.4 adds native UTF-8 support. |
0x06 | 4 bytes | ID3v2 Tag Size | Syncsafe integer | Tag body size in syncsafe encoding (7 bits per byte). Excludes the 10-byte header itself. |
after ID3 | 2 bytes | Frame Sync Word | FF FB | FF FB = MPEG-1 Layer III, no CRC. FF FA = with CRC. FF F3 = MPEG-2 Layer III. |
sync+2 | 1 byte | Bitrate / Sample Rate | 90 | Upper 4 bits = bitrate index, next 2 bits = sample rate index, then padding and private bits. |
EOF-128 | 128 bytes | ID3v1 Tag | 54 41 47 (TAG) | Optional legacy metadata block. Fixed Latin-1 encoding, 30-char fields. Deprecated. |
Attack Vectors
- ID3 tag buffer overflow
- MP3 frame header bitfield exploit
- Malicious ID3 artwork payload
Mitigation:
- Specification ISO/IEC 11172-3:1993 — MPEG-1 Audio (Layer III defines the MP3 codec)
- Specification ISO/IEC 13818-3:1998 — MPEG-2 Audio (extends Layer III to lower sampling rates)
- History Fraunhofer IIS — MP3 (inventor, Karlheinz Brandenburg, Erlangen)
- Registry MP3 (MPEG Layer III Audio Encoding) — Library of Congress Format Description
- Registry audio/mpeg — IANA Media Types
- Registry MPEG 1/2 Audio Layer 3 (fmt/134) — The National Archives PRONOM Registry
- History MP3 — Wikipedia