Comma-Separated Values
CSV stores tabular data as plain text with comma-delimited fields and CRLF-terminated records, governed by RFC 4180. Every spreadsheet, database, and programming language reads CSV, making it the lowest-common-denominator format for structured data exchange.
CSV is plain text tabular data. Semantic conversion to other data formats requires schema mapping.
Common questions
Why does Excel show all CSV data in one column?
Excel uses the Windows system locale's list separator, which is semicolon in most European locales (German, French, Dutch). If your CSV uses commas but the locale expects semicolons, all fields merge into column A. Fix by using Data > From Text/CSV import with explicit comma delimiter, or change the list separator in Windows Regional Settings.
How do I open a UTF-8 CSV file in Excel without garbled characters?
Add a UTF-8 BOM (EF BB BF) to the first byte of the file, or use Data > From Text/CSV and select 65001: Unicode (UTF-8) as the file origin. Excel defaults to the system code page (often Windows-1252) when opening CSV directly, which corrupts non-ASCII characters.
What is the difference between CSV and TSV?
CSV uses comma (0x2C) as the field delimiter. TSV uses tab (0x09). TSV avoids the quoting complexity of CSV because tabs rarely appear in data values, but TSV cannot represent fields containing tab characters without escaping. Both are plain text tabular formats covered by similar parsing logic.
How do I handle commas inside a CSV field?
Per RFC 4180, enclose the field in double quotes. A field value like 'New York, NY' becomes "New York, NY" in the CSV. Double quotes inside a quoted field are escaped by doubling: "She said ""hello""" represents the string She said "hello".
What makes .CSV special
What is a CSV file?
CSV (Comma-Separated Values) is a plain text format for storing tabular data. Each line represents a row, and values within each row are separated by commas (or other delimiters). CSV is the simplest and most universal format for data exchange between spreadsheets, databases, and applications.
Continue reading — full technical deep dive
How to open CSV files
- Microsoft Excel (Windows, macOS) — Spreadsheet view
- Google Sheets (Web) — Free, online
- LibreOffice Calc (Windows, macOS, Linux) — Free
- Any text editor — Raw text view
- Python / pandas — Programmatic analysis
Technical specifications
| Property | Value |
|---|---|
| Delimiter | Comma (or semicolon, tab) |
| Encoding | UTF-8, ASCII, etc. |
| Quoting | Double quotes for special characters |
| Header | Optional first row as column names |
| Standard | RFC 4180 |
Common use cases
- Data export: Database and spreadsheet exports
- Data import: Bulk data loading
- ETL pipelines: Extract-Transform-Load workflows
- Reporting: Simple data reports
.CSV compared to alternatives
| Formats | Criteria | Winner |
|---|---|---|
| .CSV vs .JSON | Tabular data CSV is purpose-built for flat tabular data with fixed columns. JSON requires repeating keys for every row, increasing file size by 2-5x for large tabular datasets. | CSV wins |
| .CSV vs .JSON | Nested and typed data JSON natively supports nested objects, arrays, and typed values (number, boolean, null). CSV has no nesting, no type system, and no standard way to represent null vs empty string. | JSON wins |
| .CSV vs .XLSX | Simplicity and portability CSV is plain text that any tool can read. XLSX is a ZIP of XML files requiring a dedicated library or office suite to parse. | CSV wins |
| .CSV vs .XLSX | Features XLSX supports multiple sheets, cell formatting, formulas, charts, data validation, and type-aware columns. CSV stores raw text values only. | XLSX wins |
| .CSV vs .PARQUET | Query performance Parquet is columnar with built-in compression and min/max statistics per row group. Query engines skip irrelevant columns and row groups. CSV requires full file scan for every query. | PARQUET wins |
Related Formats
Technical reference
- MIME Type
text/csv- Developer
- IBM (earliest usage)
- Year Introduced
- 1972
- Open Standard
- Yes — View specification
Binary Structure
CSV is a text format with no binary structure. Each line represents one record (row), terminated by CRLF (0x0D 0x0A) per RFC 4180, though many implementations accept bare LF (0x0A). Fields within a record are separated by a comma (0x2C). Fields containing commas, double quotes, or newlines must be enclosed in double quotes (0x22). A double quote within a quoted field is escaped by doubling it ("" represents a literal "). The first record may be a header row with column names, but RFC 4180 section 2.3 states this is optional and there is no in-band mechanism to distinguish a header row from data. CSV has no type system — all values are strings, and type interpretation (integer, float, date, boolean) is left to the consuming application. There is no standard encoding declaration; files may be UTF-8, Windows-1252, ISO-8859-1, or other encodings with no way to distinguish them without heuristics. Excel on Windows defaults to the system locale's encoding (often Windows-1252) and uses the system locale's list separator (semicolon in many European locales instead of comma). A UTF-8 BOM (EF BB BF) at byte 0 signals UTF-8 to Excel but may cause issues with other parsers that do not expect a BOM.
Attack Vectors
- Formula injection (CSV injection) — cells starting with =, +, -, or @ are executed as formulas when opened in Excel or Google Sheets, enabling DDE command execution or data exfiltration via HYPERLINK()
- Encoding-based data manipulation — saving CSV as Windows-1252 then re-importing as UTF-8 silently corrupts numeric data containing locale-specific decimal separators or currency symbols
- Oversized row denial of service — a single CSV row with millions of fields or a multi-gigabyte quoted field exhausts parser memory
Mitigation: FileDex processes CSV files entirely in the browser with no server upload. CSV parsing uses standard text processing with no formula execution. Cells are treated as raw strings, not spreadsheet formulas.