HyperText Markup Language
HTML defines web page structure using markup tags that browsers render visually. FileDex provides local HTML analysis and format reference directly in your browser — no file uploads, no server processing.
Markup format. Conversion is not applicable.
Common questions
Can I open an HTML file without a web browser?
Yes. HTML files are plain text, so any text editor (VS Code, Notepad++, Sublime Text) opens them directly. You will see the raw markup tags instead of the rendered page. Terminal tools like `cat` or `less` also display HTML source.
What is the difference between .html and .htm file extensions?
They are functionally identical. The .htm extension dates back to MS-DOS and Windows 3.1, which enforced a three-character extension limit. Modern systems treat both extensions the same — servers return `text/html` for either one.
Is HTML a programming language?
No. HTML is a markup language — it describes document structure and content but cannot perform logic, loops, or calculations. Programming languages like JavaScript add interactivity and computation to HTML pages.
How do I check if my HTML file is valid?
Use the W3C Markup Validation Service at validator.w3.org, or run html5validator from the command line. These tools check for unclosed tags, missing required attributes, and deprecated elements against the HTML Living Standard.
What makes .HTML special
What is an HTML file?
HTML (HyperText Markup Language) is the standard markup language for documents displayed in web browsers. It defines the structure and content of web pages using elements (tags) like headings, paragraphs, links, images, and forms. HTML was invented by Tim Berners-Lee in 1991 and is now maintained as a Living Standard by WHATWG, meaning it evolves continuously rather than in numbered releases.
Continue reading — full technical deep dive
Every page on the web is ultimately an HTML document. When you visit a URL, your browser receives an HTML file and renders it visually. CSS controls appearance, and JavaScript adds behavior — but HTML is the foundation that makes a document a webpage.
How to open HTML files
- Any web browser (Chrome, Firefox, Edge, Safari) — Double-click to render as a web page
- VS Code (Windows, macOS, Linux) — Code editing with live preview via extensions
- Notepad++ (Windows) — Syntax-highlighted editing
- Sublime Text (Windows, macOS, Linux) — Fast code editor
Technical specifications
| Property | Value |
|---|---|
| Current Version | HTML5 (Living Standard) |
| Encoding | UTF-8 (recommended) |
| Type | Markup language |
| Standard | WHATWG Living Standard |
| MIME type | text/html |
| Related | CSS (styling), JavaScript (behavior) |
Common use cases
- Web pages: Every website is built on HTML
- Email templates: HTML-formatted emails with rich formatting
- Documentation: Technical docs, help files, and manuals
- Web applications: Single-page applications (SPAs) use a single HTML shell
- Progressive Web Apps (PWAs): Installable apps built on web technologies
HTML document structure
A minimal valid HTML5 document:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page Title</title>
</head>
<body>
<h1>Hello, World</h1>
<p>This is a paragraph.</p>
</body>
</html>
The <!DOCTYPE html> declaration tells the browser to use standards mode. The <head> contains metadata not shown to users (title, character set, stylesheets), and <body> contains the visible content.
Semantic HTML
HTML5 introduced semantic elements that describe their content's meaning to both browsers and search engines:
<header>,<footer>,<main>,<nav>,<aside>— Page structure<article>,<section>— Content grouping<figure>,<figcaption>— Images with captions<time datetime="2024-01-15">— Machine-readable dates
Semantic HTML improves accessibility (screen readers understand the page structure) and SEO (Google better understands content hierarchy).
HTML and SEO
Search engines read HTML directly. Key elements that affect ranking:
<title>— Shown in search result titles<meta name="description">— Search snippet text<h1>–<h6>heading hierarchy — Signals content structurealtattributes on images — Enables image indexing<link rel="canonical">— Prevents duplicate content penalties
Accessibility
Well-written HTML is inherently accessible. Use alt text on all images, <label> elements for form inputs, logical heading order (h1 before h2), and ARIA attributes (role, aria-label) for custom interactive components. HTML that passes WCAG 2.1 AA guidelines works better for all users, including those using screen readers or keyboard-only navigation.
.HTML compared to alternatives
| Formats | Criteria | Winner |
|---|---|---|
| .HTML vs .XHTML | Parser strictness HTML uses a lenient error-recovery parser that renders malformed markup. XHTML requires strict XML well-formedness — a single unclosed tag causes a fatal parse error. HTML's tolerance made it the practical winner for web authoring. | HTML wins |
| .HTML vs .MARKDOWN | Authoring speed Markdown syntax is faster to write for text-heavy content (headings, lists, links) but cannot express interactive elements, forms, or complex layouts that HTML handles natively. | MARKDOWN wins |
| .HTML vs .PDF | Editability HTML source is plain text editable in any text editor. PDF is a binary format requiring specialized tools to modify content, making HTML the better choice for living documents. | HTML wins |
Related Formats
Technical reference
- MIME Type
text/html- Developer
- World Wide Web Consortium (W3C) / WHATWG
- Year Introduced
- 1993
- Open Standard
- Yes — View specification
Binary Structure
HTML is a plain-text format encoded in UTF-8 (recommended by the spec, though legacy pages may use ISO-8859-1 or Windows-1252). Files have no binary magic bytes. The document typically begins with `<!DOCTYPE html>` followed by the `<html>` root element. A UTF-8 BOM (EF BB BF) is permitted but discouraged by the WHATWG spec — browsers handle it, but it can break PHP short tags and shell scripts that concatenate HTML. Line endings are normalized by parsers: CR, LF, and CRLF are all treated as a single line break.
Attack Vectors
- XSS (Cross-Site Scripting): malicious JavaScript injected via unsanitized user input into innerHTML, href, or event handler attributes
- Script injection: inline <script> tags or javascript: URIs execute arbitrary code when the page loads
- Iframe clickjacking: transparent iframes overlaid on legitimate UI elements trick users into clicking hidden actions
- Form phishing: fake login forms embedded in HTML mimic trusted sites to harvest credentials
- CSS data exfiltration: attribute selectors and @font-face requests can leak sensitive data character-by-character
Mitigation: FileDex processes HTML files locally in the browser with no external resource loading, no script execution, and no network requests. Content Security Policy headers block inline scripts and frame embedding.