Microsoft Word Document
DOC is Microsoft Word's legacy binary format (Word 97-2003), built on the OLE2 Compound File Binary Format. It stores formatted text, images, tables, and VBA macros in a proprietary binary structure that was superseded by the XML-based DOCX in Office 2007.
Binary Word format requires Microsoft's proprietary rendering engine not available in browser WASM.
أسئلة شائعة
What is the difference between DOC and DOCX?
DOC is a proprietary binary format (OLE2 Compound File) used by Word 97-2003. DOCX is a ZIP archive of XML files introduced in Office 2007 as an open standard (ECMA-376). DOCX produces smaller files, is easier to parse programmatically, and is the current default for all modern Word versions.
Is it safe to open DOC files from unknown sources?
No. DOC files can contain VBA macros that auto-execute malware, OLE embedded objects, and DDE fields that run system commands. Always open untrusted DOC files in Protected View or convert to PDF first via LibreOffice headless.
How do I convert DOC to DOCX without Microsoft Office?
Use LibreOffice from the command line: libreoffice --headless --convert-to docx input.doc. This works on Windows, macOS, and Linux without a GUI. Google Docs also converts DOC to DOCX on upload.
Can I recover text from a corrupted DOC file?
Try antiword — it reads the WordDocument stream directly and can extract text even when the Table stream (formatting data) is damaged. If antiword fails, opening the file in a hex editor and searching for ASCII text blocks can recover raw content.
ما يميز .DOC
What is a DOC file?
DOC is a binary file format used by Microsoft Word from 1997 to 2007. It stores formatted text, images, tables, and other document elements in a proprietary binary structure (OLE Compound Document). It was superseded by the XML-based DOCX format in Office 2007.
اكتشف التفاصيل التقنية
How to open DOC files
- Microsoft Word (Windows, macOS) — Full editing support
- Google Docs (Web) — Free, online editing
- LibreOffice Writer (Windows, macOS, Linux) — Free, open-source
- Apple Pages (macOS, iOS) — Free
- WPS Office (Windows, macOS, Linux, Mobile) — Free
Technical specifications
| Property | Value |
|---|---|
| Format | OLE2 Compound Document |
| Encoding | Binary |
| Macros | VBA macro support |
| Max Size | ~512 MB |
| Compatibility | Office 97–2003 |
Programs that open DOC files
- Microsoft Word — Native editor
- LibreOffice Writer — Free alternative
- Google Docs — Online editing
- WPS Office — Free office suite
- Apple Pages — macOS/iOS word processor
Common use cases
- Legacy documents: Old Word files from pre-2007
- Compatibility: Sharing with users on older software
- Macro documents: VBA-enabled templates
المرجع التقني
- نوع MIME
application/msword- Magic Bytes
D0 CF 11 E0 A1 B1 1A E1OLE2 Compound Binary File header.- المطوّر
- Microsoft
- سنة التقديم
- 1983
- معيار مفتوح
- لا
OLE2 Compound Binary File header.
البنية الثنائية
DOC files use the OLE2 Compound File Binary Format (CFBF), which is a file system within a file. The file begins with the CFBF header (512 bytes) containing the magic signature D0 CF 11 E0 A1 B1 1A E1, sector size (512 or 4096 bytes), FAT/DIFAT sector chains, and the first directory sector location. The internal directory tree contains named streams: 'WordDocument' (main text stream with FIB at offset 0), '1Table' or '0Table' (formatting and property tables selected by FIB flag), 'Data' (embedded objects and images), and optionally 'Macros' (VBA project storage) and '\x05SummaryInformation' (document metadata). The FIB (File Information Block) in the WordDocument stream is the master index — it contains offsets and lengths for every data structure (character positions, paragraph formatting, section breaks, fonts, styles). Text is stored as a continuous byte stream with formatting applied via the Table stream's PLCFs (Plex of Character/Paragraph Formatting).
| Offset | Length | Field | Example | Description |
|---|---|---|---|---|
0x00 | 8 bytes | CFBF Signature | D0 CF 11 E0 A1 B1 1A E1 | OLE2 magic bytes. Identifies the file as a Compound File Binary Format container. Shared by DOC, XLS, PPT, and other OLE2 formats. |
0x08 | 16 bytes | CLSID | 00 00 00 00 ... (16x 00) | Class identifier. Usually all zeros for DOC files. |
0x18 | 2 bytes | Minor version | 3E 00 | Minor version of the CFBF specification. |
0x1A | 2 bytes | Major version | 03 00 | 3 = CFBF v3 (512-byte sectors). 4 = CFBF v4 (4096-byte sectors). |
0x1C | 2 bytes | Byte order | FE FF | Always FE FF (little-endian). CFBF does not support big-endian. |
0x1E | 2 bytes | Sector size power | 09 00 | Sector size as power of 2. 9 = 512 bytes (v3). 12 = 4096 bytes (v4). |
0x2C | 4 bytes | FAT sectors count | 01 00 00 00 | Total number of FAT sectors. The FAT maps sector chains for all streams. |
0x30 | 4 bytes | First directory sector | 00 00 00 00 | Location of the first directory sector containing the stream/storage entries. |
نقاط الضعف
- VBA macros — DOC files can contain auto-executing macros that download and run malware (macro viruses remain the top Office attack vector)
- OLE2 embedded objects — ActiveX controls and OLE objects inside DOC can execute code when opened
- Equation Editor exploits — CVE-2017-11882 targets the legacy Equation Editor component embedded in DOC files, enabling remote code execution
- DDE (Dynamic Data Exchange) fields can execute arbitrary commands when the document is opened, even without macros enabled
الحماية: FileDex does not open or parse DOC files in the browser. DOC is a reference-only page. Users should open untrusted DOC files in Protected View (Microsoft Word) or convert to PDF via LibreOffice headless before viewing.