Microsoft PowerPoint (Open XML)
PPTX stores PowerPoint slides as compressed XML inside a ZIP container, standardized under ISO/IEC 29500 (ECMA-376). Each slide’s dimensions are specified in English Metric Units — 914400 EMU per inch. FileDex provides PPTX reference information and CLI conversion commands.
PPTX requires a full Office rendering engine for accurate conversion. FileDex displays format information and CLI conversion commands.
Common questions
What is a PPTX file?
PPTX is a Microsoft PowerPoint presentation stored as compressed XML files inside a ZIP archive. Standardized under ECMA-376 and ISO/IEC 29500, it has been the default PowerPoint format since Office 2007, replacing the older binary PPT format. The internal XML vocabulary is called PresentationML.
What is the difference between PPTX and PPT?
PPT is the legacy binary format from PowerPoint 97–2003. PPTX uses Office Open XML, an ISO standard that stores content as compressed XML in a ZIP container. PPTX files are smaller, easier to recover from corruption, and separate macros into the distinct .pptm extension.
Can I open PPTX without Microsoft PowerPoint?
Several free alternatives open PPTX with high fidelity. Google Slides works entirely in the browser with no install. Apple Keynote handles most layouts on macOS and iOS. WPS Office and OnlyOffice are cross-platform desktop options. Complex animations and Morph transitions may render differently outside PowerPoint, but static slide content transfers reliably.
Why is PPTX a ZIP file?
PPTX follows Open Packaging Conventions (OPC), which uses ZIP as the container format. This design makes files smaller through DEFLATE compression, allows individual slides or media to be extracted or repaired independently, and lets any standard ZIP utility inspect the internal XML structure directly.
What are English Metric Units in PPTX?
EMU is the measurement unit for all spatial values in PPTX. One inch equals 914400 EMU. This integer was chosen because it divides evenly by 72, 96, 150, 300, and 2.54, enabling exact conversion between inches, centimeters, and points without floating-point rounding.
How do I extract text from a PPTX file programmatically?
PPTX stores text inside XML shape elements across individual slide files. Programming libraries for Python and Java can parse these shapes and extract all text content per slide, including text frames, tables, and grouped objects. The underlying PresentationML XML is also readable by any ZIP-aware tool. See the CLI tab below for extraction commands.
Is PPTX an open standard?
Yes. PPTX is defined by ECMA-376 (approved December 2006) and ISO/IEC 29500 (approved April 2008). The full specification is publicly available from Ecma International. However, some critics note that Microsoft's implementation includes features and extensions beyond the published standard document.
Can PPTX files contain viruses?
Standard .pptx files cannot contain VBA macros, which are restricted to the separate .pptm extension. However, embedded OLE objects, ActiveX controls, and external links in relationship files can be exploited for attacks. Always open untrusted PPTX files in Protected View or upload to Google Slides.
What makes .PPTX special
What is a PPTX file?
PPTX is the default presentation format for Microsoft PowerPoint since Office 2007. Each PPTX file is a ZIP archive containing XML documents, media files, and relationship descriptors organized according to Open Packaging Conventions (OPC). The XML vocabulary inside is PresentationML, defined by ECMA-376 (first approved December 2006) and its ISO equivalent ISO/IEC 29500 (published November 2008). PPTX replaced the binary .ppt format that PowerPoint had used since its creation in 1987.
Continue reading — full technical deep dive
Robert Gaskins, Dennis Austin, and Thomas Rudkin built the original PowerPoint at Forethought, Inc. under the working name "Presenter." PowerPoint 1.0 shipped on April 20, 1987 for Macintosh. Microsoft acquired Forethought three months later for $14 million — its first significant acquisition. By the late 1990s, PowerPoint held approximately 95% worldwide market share in presentation software. The shift from binary PPT to XML-based PPTX arrived with Office 2007, driven by interoperability demands and the standardization push that produced ECMA-376.
Internal ZIP structure
Renaming any .pptx file to .zip and extracting it reveals a standardized directory tree. A python-pptx-generated test file with two slides contains 40 ZIP entries totaling 96987 bytes uncompressed (23636 bytes compressed — a 75.6% compression ratio):
presentation.zip/
├── [Content_Types].xml ← MIME manifest for every part
├── _rels/.rels ← Root relationships (entry point)
├── docProps/
│ ├── core.xml ← Dublin Core metadata (author, dates)
│ ├── app.xml ← Application metadata (slide count)
│ └── thumbnail.jpeg ← Preview image
└── ppt/
├── presentation.xml ← Slide order, dimensions, master refs
├── slides/slide1.xml ← Individual slide content
├── slideLayouts/ ← Layout templates (11 default)
├── slideMasters/ ← Master slide definitions
├── theme/theme1.xml ← Color schemes, fonts, effects
└── media/ ← Embedded images, video, audio
The [Content_Types].xml file at the root declares the MIME content type of every part. It uses <Default> elements to map file extensions to types and <Override> elements for specific part paths. This is how a ZIP-based viewer distinguishes a PPTX from a DOCX or XLSX — all three share the same 50 4B 03 04 ("PK") magic bytes, but [Content_Types].xml declares application/vnd.openxmlformats-officedocument.presentationml.presentation.main+xml for the main presentation part.
The OPC relationship chain
_rels/.rels is the first file any OPC consumer reads. It contains <Relationship> elements mapping IDs to target parts. For PPTX, the root relationship type http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument points to ppt/presentation.xml. From there, each slide, slide master, and slide layout is referenced by its own relationship ID — the XML never uses raw file paths. This indirection means parts can be renamed, moved, or supplemented without breaking references, and tools can process the presentation graph without knowledge of the physical directory layout.
English Metric Units (EMU)
All spatial measurements in PPTX — slide dimensions, shape positions, font sizes in certain contexts — use English Metric Units. One inch equals 914400 EMU. One point equals 12700 EMU. One centimeter equals 360000 EMU. The value 914400 was chosen because it divides evenly by an unusually large number of common factors: 72 (points per inch), 96 (Windows DPI), 150 and 300 (common print DPI values), and 254 (centimeters × 100 per inch). Being an integer, EMU values serialize identically across platforms and languages with no floating-point ambiguity.
The default 4:3 slide size in presentation.xml reads <p:sldSz cx="9144000" cy="6858000"/> (10 × 7.5 inches). The default 16:9 widescreen size, standard since PowerPoint 2013, is cx="12192000" cy="6858000" (13.33 × 7.5 inches).
Slide master inheritance
PPTX implements a three-tier design cascade: slide master → slide layout → slide. The slide master (ppt/slideMasters/slideMaster1.xml) defines global placeholder positions, background fill, and default text styles. Slide layouts (ppt/slideLayouts/) customize these defaults for specific purposes — title slide, section header, two-column, blank, and so on. A default python-pptx presentation ships 11 layout variants. Each individual slide inherits from its assigned layout, and can override any inherited property at the shape level. This cascade minimizes redundancy: changing a font in the slide master propagates to every slide and layout that has not explicitly overridden that property.
Strict vs. Transitional conformance
ISO/IEC 29500 defines two conformance classes for OOXML files. Transitional (the default produced by Office 2007 through Office 2019) permits legacy features including VML (Vector Markup Language) graphics carried over from older Office versions. Strict (available since Office 2013) prohibits VML entirely, requiring DrawingML for all graphics. The Library of Congress maintains separate Format Description Documents for each: fdd000399 (PPTX Transitional) and fdd000402 (PPTX Strict). In practice, nearly all PPTX files in circulation use Transitional conformance.
Programmatic access
python-pptx (current version 1.0.2) provides full read-write access to PPTX files from Python. A script can create slides, insert text and images, build charts and tables, and read existing presentation metadata — all without any Office installation. Apache POI’s XSLF module offers equivalent capabilities in Java. Microsoft’s Open XML SDK targets .NET. These libraries work directly with the underlying XML, making automated report generation — populating slide templates with database data, generating weekly dashboards, batch-processing conference slides — a standard workflow in enterprise environments.
Security boundary: PPTX vs. PPTM
Standard .pptx files cannot contain VBA macros. Presentations with macros must use the .pptm extension, which carries a different MIME type (application/vnd.ms-powerpoint.presentation.macroEnabled.12). This separation — absent in the legacy PPT format — allows email gateways, antivirus scanners, and browser download policies to block macro-enabled presentations while permitting standard PPTX files. Embedded OLE objects and ActiveX controls remain a risk vector even in non-macro PPTX files, and external media links in relationship files can trigger network requests to attacker-controlled servers when the file is opened.
.PPTX compared to alternatives
| Formats | Criteria | Winner |
|---|---|---|
| .PPTX vs .PPT | File structure PPTX is a ZIP archive of XML files (ECMA-376, ISO/IEC 29500). PPT is a monolithic binary using OLE Compound File format. PPTX files are 20–75% smaller due to DEFLATE compression, and individual corrupted slides can be recovered from the ZIP without losing the entire file. | PPTX wins |
| .PPTX vs .ODP | Animation support PresentationML supports Morph transitions (PowerPoint 2016+), 3D model animations, and complex trigger-based timing sequences. ODP (ISO/IEC 26300) supports basic entrance/exit animations but cannot represent Morph, 3D rotations, or PowerPoint-specific motion paths. | PPTX wins |
| .PPTX vs .ODP | Openness ODP is a pure open standard (OASIS, ISO/IEC 26300) developed by a multi-vendor committee. PPTX is ISO-standardized (ISO/IEC 29500) but was designed primarily by Microsoft. The ISO vote in April 2008 was controversial, with opponents arguing that ODP already served as an adequate international standard. | ODP wins |
| .PPTX vs .KEY | Cross-platform support PPTX opens in PowerPoint, LibreOffice, Google Slides, Keynote, WPS, and OnlyOffice across Windows, macOS, Linux, and the web. Apple Keynote .key files can only be edited in Keynote (macOS/iOS) and viewed in limited capacity through iCloud.com. | PPTX wins |
Technical reference
- MIME Type
application/vnd.openxmlformats-officedocument.presentationml.presentation- Magic Bytes
50 4B 03 04ZIP archive containing [Content_Types].xml and ppt/ directory.- Developer
- Microsoft / Ecma International
- Year Introduced
- 2007
- Open Standard
- Yes
ZIP archive containing [Content_Types].xml and ppt/ directory.
Binary Structure
A PPTX file is a ZIP archive following Open Packaging Conventions (OPC). Bytes 0–3 contain the ZIP local file header signature 50 4B 03 04 ("PK"). The first archived file is always [Content_Types].xml, declaring MIME types for every part in the package. The _rels/.rels file at the root establishes the relationship chain, pointing to ppt/presentation.xml as the main document part. Presentation.xml lists slide references by relationship ID and stores slide dimensions in EMU via <p:sldSz cx="9144000" cy="6858000"/>. Individual slides in ppt/slides/ contain <p:cSld> elements with shape trees (<p:spTree>), text bodies (<a:txBody>), and media references. Slide masters, layouts, and themes form a three-tier inheritance chain stored in their respective ppt/ subdirectories. Embedded media (images, video, audio) reside in ppt/media/ as binary blobs. Speaker notes are stored in ppt/notesSlides/. A test file with 2 slides and 11 default layouts produced 40 ZIP entries: 96987 bytes uncompressed, 23636 bytes compressed.
| Offset | Length | Field | Example | Description |
|---|---|---|---|---|
0x00 | 4 bytes | ZIP Signature | 50 4B 03 04 (PK) | ZIP local file header magic bytes. Shared by DOCX, XLSX, and all OPC-based formats. Differentiated by [Content_Types].xml declaring PresentationML content types. |
0x04 | 2 bytes | Version needed | 14 00 (v2.0) | Minimum ZIP specification version required for extraction. OPC references PKWARE ZIP spec v6.2.0 (2004). |
0x08 | 2 bytes | Compression method | 08 00 (DEFLATE) | DEFLATE compression (method 8) is standard for OPC parts. Method 0 (stored) used for already-compressed media like JPEG. |
0x1A | 2 bytes | Filename length | 13 00 (19 bytes) | Length of the first archived filename. Value 19 (0x13) corresponds to [Content_Types].xml, always the first entry. |
Attack Vectors
- VBA macro execution via PPTM
- Embedded OLE objects and ActiveX controls
- External media and template links
- Malicious embedded fonts
Mitigation: FileDex does not execute PPTX files. Open untrusted PPTX in Protected View (Office) or upload to Google Slides, which strips macros, ActiveX controls, and external content links. Block .pptm at the email gateway.
- Specification ECMA-376: Office Open XML File Formats — Ecma International
- Registry PPTX Transitional (Office Open XML), ISO 29500:2008-2016 — Library of Congress (fdd000399)
- History Microsoft PowerPoint — Wikipedia
- Specification Office Open XML — Wikipedia
- Software python-pptx 1.0.2 — Python library for PowerPoint PPTX files