.MSG Outlook Email Message
.msg

Outlook Email Message

MSG wraps a single Outlook email inside an OLE2 Compound Binary container — the same structure as .doc and .xls files — storing the body in text, HTML, and compressed RTF alongside MAPI property streams carrying follow-up flags, voting buttons, and 400+ metadata fields EML cannot represent.

File structure
Header schema
Records structured data
MetadataTextForms1997
By FileDex
Not convertible

MSG files use the OLE2 Compound Binary Format with MAPI property streams that require native Windows COM/MAPI libraries to parse correctly. No browser-based decoder exists for OLE2 containers, and the format's reliance on 400+ MAPI property definitions with complex type encoding makes client-side conversion infeasible. Convert MSG to EML or extract attachments using desktop tools documented in the CLI tab.

Common questions

How do I open a MSG file without Microsoft Outlook installed?

Free dedicated viewers exist for every platform. On Windows, MSG Viewer Pro and Encryptomatic Free MSG Viewer both open MSG files natively. On macOS, Letter Opener and Klammer read MSG without Outlook. On Linux, transform the file to EML format using a tool from the CLI tab, then open the result in Thunderbird or any email client.

What is the difference between MSG and EML email formats?

MSG is a proprietary OLE2 binary container that preserves all MAPI properties including voting buttons, follow-up flags, and categories. EML is an open plain-text format based on RFC 5322 that any email client can read. Converting MSG to EML loses Outlook-specific metadata that has no MIME equivalent.

Why does my MSG file show garbled text for Arabic or other non-Latin characters?

The MSG file was likely created by Outlook 2002 or earlier, which saved strings in 8-bit ANSI encoding tied to the Windows code page. Arabic characters were destroyed unless the system used Arabic locale. Outlook 2007 and later default to Unicode MSG, which stores all scripts correctly in UTF-16LE encoding.

Can I convert a MSG file to PDF for legal archiving?

Yes. The standard workflow is to extract the HTML body from the MSG container using a desktop tool, then render that HTML to PDF through a print-to-PDF function. Some commercial eDiscovery platforms perform this conversion in bulk. See the CLI tab for extraction commands.

Are MSG files safe to open from unknown senders?

MSG files carry high security risk. They can embed OLE objects with macros, nested MSG files that obfuscate malicious payloads, and HTML bodies with credential harvesting forms. Always scan MSG files with an OLE2-aware antivirus before opening, and never enable macros or active content in files from untrusted sources.

What MAPI properties does MSG preserve that EML cannot?

MSG stores over 400 MAPI properties including PR_FLAG_STATUS (follow-up flags), PR_VOTING_RESPONSE (voting buttons), PR_SENSITIVITY (confidentiality level), message categories, custom form definitions, and Bcc recipients. EML only carries standard MIME headers and has no mechanism for these Outlook-specific metadata fields.

How is MSG different from PST and MBOX?

MSG stores exactly one message per file in an OLE2 binary container. PST is a Microsoft mailbox database holding thousands of messages, calendars, and contacts in a B-tree structure. MBOX concatenates multiple plain-text messages in a single file separated by marker lines. MSG is for individual message portability; PST and MBOX are for bulk storage.

Why do some email gateways fail to scan MSG file attachments?

MSG files use the OLE2 Compound Binary Format, which requires specialized parsing to access embedded streams. Email gateways designed for MIME-formatted messages may treat MSG attachments as opaque binary blobs without inspecting the internal OLE2 structure, allowing embedded threats to pass through undetected.

What makes .MSG special

Born 1997
Same container as Word .doc files
Outlook 97 introduced MSG built on the OLE2 Compound Binary format that Microsoft had shipped with Word and Excel since 1993. An MSG file and a .doc file start with the same 8 magic bytes — D0 CF 11 E0 A1 B1 1A E1 — and use the same sector-based storage architecture. Only the internal stream names differ.
Three bodies
Plain text, HTML, and compressed RTF stored simultaneously
Every MSG file can carry the email body in three parallel formats inside one container. The RTF copy uses LZFu compression at 3:1 to 5:1 ratios and often wraps the HTML original inside RTF markup via the \fromhtml1 control word — a design decision from an era when Exchange transport agents only handled RTF reliably.
eDiscovery standard
SEC Rule 17a-4 and MiFID II accept MSG as original-format evidence
Financial regulators require broker-dealers and investment firms to retain emails in original format. MSG files satisfy this because they are the canonical Outlook representation — not a converted copy. Legal processing platforms like Nuix and Relativity ingest MSG directly, hashing the OLE2 container for forensic integrity.
400+ properties
Voting buttons, follow-up flags, and Bcc survive in MSG but not EML
The MAPI property system encodes each metadata field as a 16-bit property ID and 16-bit type. PR_VOTING_RESPONSE, PR_FLAG_STATUS, PR_SENSITIVITY, and the Bcc recipient list are stored as first-class properties inside MSG. Converting to EML irreversibly drops every property that has no MIME header equivalent.

Drag an email from Outlook to your desktop and you get a .msg file. That file is not a text dump — it is a full OLE2 Compound Binary container, the same sector-based storage system that Microsoft used for Word .doc, Excel .xls, and Windows Installer .msi files throughout the 1990s and 2000s. The 8-byte magic signature D0 CF 11 E0 A1 B1 1A E1 at offset zero identifies the container, but it does not identify MSG specifically. A .doc file starts with the same bytes. What makes an MSG file an MSG file is the internal stream layout: a root storage containing __properties_version1.0 for message-level MAPI properties, __substg1.0_* streams for the subject, body, and headers, __recip_version1.0_# sub-storages for each recipient, and __attach_version1.0_# sub-storages for each attachment.

Continue reading — full technical deep dive

The OLE2 container layer

The Compound File Binary Format (CFB) divides the file into fixed-size sectors — 512 bytes in version 3, 4096 bytes in version 4. A File Allocation Table (FAT) chains sectors together, forming logical streams that can span non-contiguous sectors across the file. The directory stream maps storage and stream names to their starting sectors. This architecture means a 50 KB email and a 50 MB email with large attachments use the same structural framework — only the number of allocated sectors differs.

The 512-byte header at offset zero contains the magic signature, format version, sector size as a power of 2 (0x09 for 512 bytes, 0x0C for 4096 bytes), a byte-order mark (always FE FF, little-endian), and pointers to the first directory sector and first mini-stream FAT sector. Streams smaller than a configurable cutoff (typically 4096 bytes) are stored in the mini-stream using 64-byte mini-sectors, reducing wasted space for the many small property streams inside an MSG file.

MAPI property storage

Every email property in an MSG file is identified by two 16-bit values: a property ID and a property type. The property ID identifies what the property represents — 0x0037 for the subject, 0x1000 for the plain text body, 0x1013 for the HTML body, 0x0C1F for the sender email address. The property type specifies the data format — 0x001F for Unicode string, 0x0102 for binary, 0x0040 for FILETIME, 0x0003 for 32-bit integer.

Fixed-length properties (integers, booleans, timestamps) are packed directly into the __properties_version1.0 stream as 16-byte entries. Variable-length properties (strings, binary blobs) get their own sub-streams named __substg1.0_{propID}{propType} — so the Unicode subject line lives in __substg1.0_0037001F and the HTML body lives in __substg1.0_10130102.

Over 400 standard MAPI properties are defined in the MS-OXPROPS specification. MSG files commonly carry 50 to 150 of them per message. Named properties — custom metadata defined by applications — use a mapping table in the __nameid_version1.0 storage that translates GUIDs and string identifiers to numeric property IDs.

Three bodies in one file

Outlook stores the message body in up to three parallel formats inside a single MSG file: plain text in PR_BODY (0x1000), HTML in PR_BODY_HTML (0x1013), and Rich Text in PR_RTF_COMPRESSED (0x1009). The RTF stream uses LZFu compression, a Microsoft-proprietary algorithm documented in MS-OXRTFCP that achieves 3:1 to 5:1 compression ratios. The RTF body often contains a \fromhtml1 control word, meaning Outlook stored the original HTML email wrapped inside RTF markup. Some extraction tools parse the HTML from this RTF wrapper rather than reading PR_BODY_HTML directly.

This triple-body design exists because different Outlook rendering paths historically consumed different formats. The plain text body serves search indexing, the HTML body serves web rendering, and the RTF body preserves formatting that early Exchange transport agents could handle reliably. Modern MSG files created by Outlook 2016 and later almost always populate all three.

Recipients and attachments as sub-storages

Recipients are stored in numbered sub-storages: __recip_version1.0_#00000000 for the first recipient, __recip_version1.0_#00000001 for the second, and so on. Each recipient sub-storage contains its own __properties_version1.0 stream with properties like PR_DISPLAY_NAME (0x3001), PR_EMAIL_ADDRESS (0x3003), and PR_RECIPIENT_TYPE (0x0C15 — 1 for To, 2 for Cc, 3 for Bcc).

Attachments follow the same pattern: __attach_version1.0_#00000000 and upward. Each attachment sub-storage holds PR_ATTACH_FILENAME (0x3704), PR_ATTACH_DATA_BIN (0x3701) containing the raw file bytes, PR_ATTACH_EXTENSION (0x3703), and PR_ATTACH_METHOD (0x3705) indicating whether the attachment is by value (binary blob), by reference (file link), or an embedded OLE object. When the attachment is another email — a forwarded message or an attached .msg — the attachment sub-storage contains a complete nested MSG structure with its own recipients, properties, and potentially further nested attachments. This recursive nesting has no specification-imposed depth limit.

Unicode versus ANSI MSG

Outlook 97 through Outlook 2002 created ANSI MSG files. String properties used 8-bit encoding (property type 0x001E) tied to the system's ANSI code page. This caused character corruption for any language outside the system's default locale — Arabic text saved on a Western European Windows installation would produce garbled characters when reopened.

Outlook 2003 introduced Unicode MSG support (property type 0x001F, UTF-16LE encoding). Outlook 2007 made Unicode the default for all new saves. The distinction is visible in the stream names: __substg1.0_0037001E is the ANSI subject, __substg1.0_0037001F is the Unicode subject. Modern parsing tools must handle both variants because pre-2003 MSG files still circulate in corporate archives.

MSG in legal discovery

EDiscovery workflows treat MSG files as primary evidence because they preserve the original message with all metadata intact — timestamps, read/unread status, importance flags, follow-up dates, categories, and the complete recipient list including Bcc recipients (which are stripped from delivered EML). Legal processing tools like Nuix, Relativity, and dtSearch ingest MSG files directly, indexing MAPI properties for faceted search across millions of messages. The OLE2 container's sector-based structure makes forensic hashing straightforward — a SHA-256 digest of the complete file uniquely identifies the message and its contents without ambiguity.

Regulatory frameworks in finance (SEC Rule 17a-4, MiFID II) and healthcare (HIPAA) require email retention in original format. MSG files satisfy this requirement for Outlook-originated messages because they are the canonical representation, not a converted copy.

The TNEF connection

Transport Neutral Encapsulation Format (TNEF), commonly seen as winmail.dat attachments, is MSG's close relative. When Outlook sends a Rich Text formatted email through a non-Exchange gateway, the MAPI properties and RTF body are packed into a TNEF blob attached to a plain-text MIME message. Recipients without Outlook see the winmail.dat attachment and cannot read the rich formatting. TNEF uses a different binary structure than OLE2 but carries the same MAPI properties. Converting winmail.dat to MSG or directly extracting its properties requires TNEF-aware tools.

Security surface

MSG files carry significant security risk because they can embed arbitrary OLE objects, HTML with active content, and nested attachments that create multi-layer obfuscation. CVE-2025-21298 demonstrated a double-free vulnerability in Windows OLE that allowed remote code execution when Outlook rendered a crafted RTF body — no user interaction beyond previewing the email was required. CVE-2023-36563 showed how OLE object conversion could leak NTLM credentials to a remote server. CVE-2017-0199 exploited OLE2link objects to download and execute remote HTA files. These vulnerabilities target the OLE2 parsing layer that MSG files share with .doc and .xls, meaning MSG-specific attacks often reuse techniques from Office document exploits. Email security gateways that scan only MIME-formatted messages may miss threats embedded inside MSG file attachments, which require OLE2 parsing to inspect.

.MSG compared to alternatives

.MSG compared to alternative formats
Formats Criteria Winner
.MSG vs .EML
Cross-platform compatibility
EML follows RFC 5322, an open IETF standard readable by Thunderbird, Apple Mail, Gmail, and any text editor. MSG requires Microsoft Outlook or specialized OLE2 parsing libraries — it cannot be opened in any non-Microsoft email client without conversion.
EML wins
.MSG vs .EML
Outlook metadata preservation
MSG preserves all 400+ MAPI properties including voting buttons, follow-up flags, categories, custom form data, and Bcc recipients. EML maps only standard MIME headers — X-headers can carry some metadata, but Outlook-specific features are irreversibly lost during MSG-to-EML conversion.
MSG wins
.MSG vs .PST
Single message portability
MSG stores exactly one message per file, making it shareable as an email attachment or file transfer. PST is a mailbox database (B-tree structure, up to 50 GB in Outlook 2016+) designed for bulk storage of thousands of messages, calendars, contacts, and tasks.
MSG wins
.MSG vs .MBOX
Forensic integrity for eDiscovery
MSG files are self-contained OLE2 containers where SHA-256 hashing produces a unique digest per message. MBOX concatenates multiple messages in a plain text file separated by 'From ' lines — individual message integrity cannot be verified without splitting the file first.
MSG wins

Technical reference

MIME Type
application/vnd.ms-outlook
Magic Bytes
D0 CF 11 E0 A1 B1 1A E1 OLE2 Compound Binary File header.
Developer
Microsoft
Year Introduced
1997
Open Standard
No
00000000D0CF11E0A1B11AE1 ........

OLE2 Compound Binary File header.

Binary Structure

MSG files use the OLE2 Compound File Binary Format. The file begins with a 512-byte header containing the magic signature D0 CF 11 E0 A1 B1 1A E1, followed by a FAT (File Allocation Table) for sector chain management. The directory stream contains entries for the root storage and sub-storages. Email properties are stored as MAPI property streams (__properties_version1.0), with body content in __substg1.0_* streams (named by property ID and type), recipients in __recip_version1.0_# sub-storages, and attachments in __attach_version1.0_# sub-storages. Streams smaller than 4096 bytes are stored in the mini-stream using 64-byte mini-sectors.

OffsetLengthFieldExampleDescription
0x00 8 bytes Magic number D0 CF 11 E0 A1 B1 1A E1 OLE2 Compound Binary File signature — shared with .doc, .xls, .ppt, and .msi files. Does not uniquely identify MSG; internal stream layout determines the file type.
0x08 16 bytes CLSID 00 00 00 00 ... Class identifier for the root storage. Typically all zeros for MSG files; some Outlook versions write a non-zero GUID.
0x18 2 bytes Minor version 3E 00 Minor version of the CFB specification (little-endian). Value 0x003E is standard.
0x1A 2 bytes Major version 03 00 Major version: 3 = 512-byte sectors, 4 = 4096-byte sectors. Most MSG files use version 3.
0x1C 2 bytes Byte order FE FF Byte order mark — FE FF indicates little-endian. All CFB files are little-endian; no big-endian implementation exists.
0x1E 2 bytes Sector size power 09 00 Sector size as power of 2: 0x09 = 512 bytes (v3), 0x0C = 4096 bytes (v4). Controls the allocation granularity for all streams.
1993Microsoft ships the OLE2 Compound Binary File Format (Structured Storage) with Office 4.0 — the container technology that MSG will later adopt1997Outlook 97 introduces MSG as its native email save format, storing messages in OLE2 containers with MAPI property streams2003Outlook 2003 adds Unicode MSG support alongside the original ANSI format, fixing character corruption for Arabic, CJK, and Cyrillic text2007Outlook 2007 makes Unicode MSG the default for all new saves — ANSI MSG creation becomes opt-in only2008Microsoft publishes the first version (0.1) of the MS-OXMSG Open Specification, documenting the MSG file format publicly for the first time2016Outlook for Mac gains native MSG file reading support, ending the macOS limitation that required third-party viewers2025MS-OXMSG specification reaches revision 18.0 (May 2025), continuing to document MSG structure for interoperability with non-Microsoft implementations
Convert MSG to EML format other
msgconvert message.msg

Converts a proprietary MSG file to the open RFC 5322 EML format. MAPI-only properties like voting buttons and follow-up flags are lost because EML has no equivalent headers. Body text, HTML, recipients, and standard attachments transfer intact.

Extract body and attachments from MSG other
python3 -c "import extract_msg; m=extract_msg.Message('mail.msg'); m.save()"

Parses the OLE2 container, reads MAPI property streams, and saves the plain text body, HTML body, and all file attachments to disk. Creates a directory named after the subject line containing the extracted contents.

Read MAPI properties from MSG other
python3 -c "import extract_msg; m=extract_msg.Message('mail.msg'); print(m.subject, m.date, m.sender)"

Reads the __properties_version1.0 stream and __substg1.0_* streams to extract specific MAPI properties. Property IDs 0x0037 (subject), 0x0E06 (delivery time), and 0x0C1F (sender address) are decoded from the binary OLE2 structure.

Dump OLE2 directory tree of MSG other
python3 -c "import olefile; ole=olefile.OleFileIO('mail.msg'); ole.dumpdirectory()"

Lists all storages and streams inside the OLE2 container — reveals __recip_version1.0_# entries (recipients), __attach_version1.0_# entries (attachments), and __substg1.0_* streams (body, subject, headers). Useful for diagnosing corrupt or unusual MSG files.

Verify OLE2 magic bytes in MSG file other
xxd -l 8 message.msg

Reads the first 8 bytes of the file. A valid OLE2 container (including MSG) starts with D0 CF 11 E0 A1 B1 1A E1. This signature is shared with .doc, .xls, and .msi — it confirms OLE2 format but does not prove the file is specifically MSG.

HIGH

Attack Vectors

  • Embedded OLE objects can contain macros, executables, and ActiveX controls that execute when the MSG file is opened in Outlook or OLE-aware viewers
  • HTML body may include phishing links, tracking pixels, and form-based credential harvesting pages rendered in Outlook's embedded browser
  • Nested MSG attachments create multi-layer obfuscation — an MSG inside an MSG inside an MSG can bypass email gateway scanning that only inspects one level of nesting
  • Crafted OLE2 structures exploit parsing vulnerabilities: CVE-2025-21298 (double-free RCE in Windows OLE via RTF preview), CVE-2023-36563 (NTLM credential leak via OLE object conversion), CVE-2017-0199 (remote HTA execution via OLE2link objects)
  • MSG files bypass MIME-based email security gateways because the OLE2 binary container requires specialized parsing that many gateways do not implement

Mitigation: FileDex does not open, execute, or parse MSG files. This is a reference page only. Never open MSG files from untrusted sources without scanning with an OLE2-aware antivirus engine.

extract-msg library
Python library for parsing MSG files without Outlook — extracts body text, headers, recipients, and attachments by reading OLE2 streams and MAPI properties directly
olefile library
Python library for reading and writing OLE2 Compound Binary files — provides low-level access to the sector table, directory entries, and raw streams inside MSG, DOC, and XLS containers
msgconvert tool
Perl command-line converter that transforms MSG files to RFC 5322 EML format by mapping MAPI properties to MIME headers on Linux and macOS
The only email client that creates and fully renders MSG files with complete MAPI property support, OLE object rendering, and RTF body display
oxmsg library
Open-source C# library for writing MSG files programmatically — used by Tuta (formerly Tutanota) for MSG export from their encrypted email service
Nuix tool
Enterprise eDiscovery platform that ingests MSG files at scale, indexing MAPI properties for faceted search across millions of messages in legal and compliance workflows