.DOCX Microsoft Word Document (Open XML)
.docx

Microsoft Word Document (Open XML)

DOCX is Microsoft Word's default format since Office 2007, storing documents as compressed XML inside a ZIP container (Office Open XML, ISO/IEC 29500). Convert DOCX to PDF using LibreOffice headless mode or open in Google Docs, Apple Pages, or OnlyOffice.

Document structure
Header version
Body content tree
Index references
DocumentOOXMLISO/IEC 295002007ZIP Container
By FileDex
Not convertible

Office Open XML requires complex rendering engine for styles, templates, and embedded objects not available in browser WASM.

Looking to convert? Try a related format:

Common questions

How do I convert DOCX to PDF without Microsoft Word?

Install LibreOffice and run: libreoffice --headless --convert-to pdf input.docx. This produces a PDF with preserved formatting, fonts, and images. Google Docs also exports DOCX to PDF via File > Download > PDF.

Can I open a DOCX file on Linux?

LibreOffice Writer opens DOCX files natively on all Linux distributions. Install via your package manager (apt install libreoffice-writer on Debian/Ubuntu). OnlyOffice is another option with high DOCX fidelity.

Why does my DOCX look different in LibreOffice vs Word?

Font substitution is the primary cause. DOCX files reference Windows fonts (Calibri, Cambria) that may not be installed on Linux. Install the Microsoft core fonts package (ttf-mscorefonts-installer) or embed fonts in the document before sharing.

Is DOCX the same as DOC?

No. DOC is a proprietary binary format used by Word 97-2003. DOCX is an open XML format (ISO/IEC 29500) introduced with Office 2007. DOCX files are smaller, more interoperable, and easier to recover from corruption because the XML is human-readable.

What makes .DOCX special

What is a DOCX file?

DOCX is the default document format for Microsoft Word since Office 2007. It uses the Office Open XML (OOXML) standard, storing content as compressed XML files inside a ZIP container. This makes it more compact and resilient than the legacy binary DOC format.

Continue reading — full technical deep dive

How to open DOCX files

  • Microsoft Word (Windows, macOS, Web) — Full editing
  • Google Docs (Web) — Free, online editing
  • LibreOffice Writer (Windows, macOS, Linux) — Free, open-source
  • Apple Pages (macOS, iOS) — Free
  • OnlyOffice (Windows, macOS, Linux) — Free, open-source

Technical specifications

Property Value
Format Office Open XML (OOXML)
Container ZIP archive
Content XML documents + media resources
Standard ISO/IEC 29500, ECMA-376
Macros .docm extension for macro-enabled

Programs that open DOCX files

  • Microsoft Word — Native editor
  • Google Docs — Free online editing
  • LibreOffice Writer — Free office suite
  • WPS Office — Free alternative
  • OnlyOffice — Open-source office

Common use cases

  • Business documents: Reports, letters, proposals
  • Academic papers: Essays, theses, dissertations
  • Resumes: Job applications and CVs
  • Legal documents: Contracts and agreements

.DOCX compared to alternatives

.DOCX compared to alternative formats
Formats Criteria Winner
.DOCX vs .DOC
File size
DOCX uses ZIP compression, typically producing files 30-50% smaller than equivalent binary DOC files. The XML structure also compresses text content more efficiently than the binary compound document format.
DOCX wins
.DOCX vs .ODT
Interoperability
Both are XML-in-ZIP formats with ISO standards (29500 vs 26300). DOCX has wider application support due to Office market share. ODT has better fidelity in LibreOffice. Complex features like SmartArt and ActiveX controls exist only in DOCX.
Draw
.DOCX vs .PDF
Editability
DOCX stores content as semantic paragraphs with styles, sections, and tracked changes — designed for editing. PDF stores positioned glyphs in content streams optimized for rendering, not modification.
DOCX wins
.DOCX vs .RTF
Feature support
RTF supports basic formatting, tables, and images but lacks track changes, SmartArt, themes, content controls, and structured document properties. DOCX supports the full Office feature set including OLE embedding.
DOCX wins

Technical reference

MIME Type
application/vnd.openxmlformats-officedocument.wordprocessingml.document
Magic Bytes
50 4B 03 04 ZIP archive header. Contains [Content_Types].xml and word/ directory.
Developer
Microsoft / Ecma International
Year Introduced
2007
Open Standard
Yes — View specification
00000000504B0304 PK..

ZIP archive header. Contains [Content_Types].xml and word/ directory.

Binary Structure

A DOCX file is a standard ZIP archive with the magic bytes 50 4B 03 04 (PK). Inside, [Content_Types].xml at the root declares MIME types for each part. The _rels/.rels file defines relationships between parts. The main document body lives in word/document.xml, containing paragraphs (<w:p>), runs (<w:r>), and text nodes (<w:t>) in the WordprocessingML namespace. Styles are defined in word/styles.xml, numbering definitions in word/numbering.xml, and font tables in word/fontTable.xml. Images and media are stored in word/media/ as binary files referenced by relationship IDs. Headers and footers are separate XML files (header1.xml, footer1.xml) linked via section properties. The word/settings.xml file controls document-level settings like track changes, compatibility mode, and zoom level.

OffsetLengthFieldExampleDescription
0x00 4 bytes ZIP Signature 50 4B 03 04 (PK) Standard ZIP local file header. Shared with all OOXML formats (.xlsx, .pptx) and other ZIP-based files.
0x04 2 bytes Version needed 14 00 (v2.0) Minimum ZIP version needed to extract. OOXML typically uses version 2.0 (value 20).
0x1A 2 bytes Filename length 13 00 Length of the first archived filename, usually [Content_Types].xml.
2000Microsoft begins developing Office Open XML as a successor to binary Office formats2006ECMA-376 (Office Open XML) approved by Ecma International2007Office 2007 launches with DOCX as the default Word format, replacing binary .doc2008ISO/IEC 29500 approved after contentious standardization vote, alongside ODF (ISO 26300)2012ISO/IEC 29500:2012 revision aligns the strict conformance profile with actual Office implementations2016Office 365 and Google Docs achieve broad DOCX interoperability for standard documents
Convert DOCX to PDF via LibreOffice headless other
libreoffice --headless --convert-to pdf input.docx

--headless runs LibreOffice without a GUI, making it suitable for server-side batch processing. The PDF output preserves page layout, fonts, and images from the original document.

Batch convert all DOCX files to PDF in a directory other
libreoffice --headless --convert-to pdf *.docx

Glob expansion passes all DOCX files in the current directory to LibreOffice. Each file produces a corresponding .pdf file in the same directory.

Extract raw document.xml from DOCX other
unzip -p input.docx word/document.xml | xmllint --format -

Pipes the main content XML from the DOCX ZIP archive to xmllint for pretty-printing. Useful for debugging formatting issues or inspecting tracked changes at the XML level.

Convert DOCX to Markdown with Pandoc other
pandoc -f docx -t markdown -o output.md input.docx

Pandoc reads the DOCX XML structure and converts it to Markdown, preserving headings, lists, links, and basic formatting. Images are extracted to a media/ directory.

DOCX PDF render near-lossless PDF preserves exact page layout, embedded fonts, and formatting across all viewers. Converting DOCX to PDF is the standard workflow for sharing final documents — recipients see identical output regardless of their installed fonts or Office version.
DOCX ODT export near-lossless ODF (Open Document Format) is the ISO 26300 standard used by LibreOffice, Google Docs export, and government document archives in the EU. Converting DOCX to ODT enables editing in open-source tools without Microsoft Office licensing.
DOCX TXT export lossy Plain text extraction strips all formatting, images, and metadata — producing a lightweight file suitable for full-text indexing, command-line processing, and LLM ingestion pipelines where layout is irrelevant.
DOCX HTML export near-lossless HTML export converts Word paragraph styles and formatting to CSS, enabling web publishing of document content. Table structures, hyperlinks, and inline images are preserved. Complex layout features like columns and text boxes may require manual cleanup.
MEDIUM

Attack Vectors

  • Macro-enabled DOCM files (renamed to DOCX) can execute VBA code on opening if macros are enabled in the user's Office security settings
  • External data connections and linked OLE objects can fetch remote payloads when the document is opened, bypassing initial file scanning
  • Embedded ActiveX controls in DOCX files can execute arbitrary code in Office versions prior to the Protected View sandbox
  • Template injection via _rels/document.xml.rels can redirect the document template to a remote URL hosting a macro-enabled template

Mitigation: FileDex does not execute DOCX files. The format page is reference-only. For safe handling, always open untrusted DOCX files in Protected View (Office) or upload to Google Docs (which strips macros and active content).

Primary DOCX editor with full feature support
Free office word processor with strong DOCX compatibility
Google Docs service
Free web-based editor with DOCX import and export
python-docx library
Python library for creating and modifying DOCX files programmatically
Pandoc tool
Converts DOCX to/from Markdown, HTML, LaTeX, and 40+ formats
docx4j library
Java library for OOXML manipulation with JAXB binding