.NDJSON Newline-Delimited JSON
.ndjson

Newline-Delimited JSON

NDJSON (Newline Delimited JSON) stores one valid JSON value per line in a UTF-8 text file. Also known as JSON Lines (.jsonl), it enables streaming and line-by-line processing without loading entire files into memory. FileDex provides format reference and developer documentation for NDJSON.

Data layout
Header schema
Records structured data
TextStreaming2013
By FileDex
Not convertible

NDJSON is a text-based data format. FileDex displays format information and developer reference for working with newline-delimited JSON files.

Common questions

What is an NDJSON file?

An NDJSON file is a UTF-8 text file where each line contains one valid JSON value, separated by newline characters. The format stores structured data for streaming and batch processing. Elasticsearch, MongoDB, and BigQuery all use NDJSON as a primary data interchange format.

What is the difference between NDJSON and JSON Lines?

They are the same format with different names and file extensions. NDJSON (ndjson-spec, 2014) uses the .ndjson extension and the application/x-ndjson MIME type. JSON Lines (jsonlines.org, 2013) uses .jsonl and suggests application/jsonl. The encoding rules, line structure, and UTF-8 requirements are identical across both specifications.

Can NDJSON lines have different fields?

Yes. Each line is an independent JSON value with its own structure. Line 1 might have fields id, name, and email while line 2 has id, name, and phone. This schema flexibility makes NDJSON well-suited for evolving log formats where new fields appear over time.

Why does Elasticsearch use NDJSON for bulk operations?

Elasticsearch's bulk API alternates action metadata and document data on consecutive NDJSON lines. This lets the coordinating node parse only the small action line to route each document to the correct shard, without parsing the full document body at the routing stage.

Is NDJSON valid JSON?

Each individual line is valid JSON, but the file as a whole is not. A JSON document requires an outer structure like an array or object. NDJSON has no outer brackets — it is a sequence of independent JSON values separated by newlines. Tools expecting a single JSON document will reject an NDJSON file.

How do I validate an NDJSON file?

Validation checks that every line is valid JSON and that no line contains trailing commas, unquoted keys, or broken UTF-8. Command-line JSON processors and database tools like DuckDB report the exact line number and error on parse failure. See the CLI tab below for the specific validation command and expected output.

What MIME type should I use for NDJSON?

Use application/x-ndjson for HTTP Content-Type headers. This is the de facto standard used by Elasticsearch, CockroachDB changefeeds, and most streaming HTTP APIs. Neither application/x-ndjson nor application/jsonl is formally registered with IANA. Do not use application/json-seq, which belongs to a different format entirely (RFC 7464, JSON Text Sequences).

How large can an NDJSON file be?

There is no format-level size limit. NDJSON files can grow to hundreds of gigabytes because no outer structure needs to be closed. A producer appends records indefinitely by writing lines. Consumers process one line at a time with constant memory regardless of total file size.

What makes .NDJSON special

Two names, one format
NDJSON and JSON Lines are identical
Two independent specs — NDJSON (2014, .ndjson) and JSON Lines (2013, .jsonl) — describe the same format with different names, extensions, and MIME types. The communities have reconciled, but the dual naming still confuses developers.
Elasticsearch catalyst
Elasticsearch's bulk API made NDJSON mainstream
Elasticsearch chose NDJSON for its _bulk endpoint so the coordinating node could parse small action lines for routing without deserializing full documents. This single API decision exposed millions of developers to the format.
No size ceiling
NDJSON files can grow to hundreds of gigabytes
Because NDJSON has no outer brackets or closing structure, a file can grow indefinitely by appending lines. A 500 GB log file is as structurally valid as a 50-byte file with one record. Consumers read it with constant memory.
Ghost specification
ndjson.org no longer hosts the spec
The original ndjson.org domain is squatted as of 2026. The authoritative specification lives on GitHub at ndjson/ndjson-spec. An earlier draft of the spec allowed // comments in NDJSON, but this was removed for RFC 8259 compatibility.

What is an NDJSON file?

An NDJSON file contains one valid JSON value per line, with each line terminated by a newline character (0x0A). The file has no opening bracket, no closing bracket, and no commas between records. A three-record NDJSON file looks exactly like this:

Continue reading — full technical deep dive
{"id":1,"name":"Alice","role":"engineer"}
{"id":2,"name":"Bob","role":"designer"}
{"id":3,"name":"Carol","role":"manager"}

Each line is a standalone JSON value that can be parsed independently. The newline character is the only delimiter. This simplicity is the format's entire purpose: any program that can read lines from a file can read NDJSON records.

Two names, one format

The NDJSON ecosystem carries a naming split that confuses developers encountering the format for the first time. Two independent specifications describe the same format:

JSON Lines (jsonlines.org, published 2013) uses the .jsonl extension and suggests the MIME type application/jsonl. NDJSON (GitHub ndjson-spec, published 2014) uses the .ndjson extension and specifies application/x-ndjson. A third historical name, LDJSON (Line Delimited JSON), appears in older documentation.

The rules are identical: UTF-8 encoding, one JSON value per line, newline separator. Both specs accept \n and \r\n as line endings. Both prohibit the byte-order mark (BOM). The communities behind the two specs have acknowledged the equivalence, yet the dual naming persists in tools, APIs, and file extensions. Elasticsearch uses application/x-ndjson. OpenAI's fine-tuning API accepts .jsonl files. MongoDB exports produce files that match both specs but use neither extension by default.

For practical purposes: .ndjson and .jsonl files are interchangeable. Rename one to the other and every compliant tool will process it identically.

Why newline-delimited: the streaming advantage

A standard JSON array wraps all records in brackets and separates them with commas:

[
  {"id":1,"name":"Alice"},
  {"id":2,"name":"Bob"}
]

To parse this, a program must either load the entire file into memory or implement a streaming JSON parser that tracks bracket nesting depth. For a 50 GB log file, both approaches carry substantial cost.

NDJSON eliminates the outer structure entirely. A consumer reads one line, parses it as JSON, processes it, discards it, and moves to the next line. Memory usage stays constant regardless of file size. A producer appends a new record by writing one line — no need to seek backward to insert before a closing bracket. This append-only property makes NDJSON the natural choice for log files, event streams, and data pipelines where records arrive continuously.

Unix command-line tools gain direct access to NDJSON records because the line is the record boundary: wc -l counts records, head -n 100 extracts the first hundred, tail -f streams new records as they arrive, and grep filters by content. None of these tools understand JSON — they understand lines, and NDJSON aligns its record boundary with the line boundary.

Format rules

The combined rules from both specifications:

  1. Encoding: UTF-8, mandatory. No BOM.
  2. Line content: Each line MUST contain exactly one valid JSON value per RFC 8259. Objects and arrays are most common, but strings, numbers, booleans, and null are permitted.
  3. Line separator: \n (0x0A). \r\n (0x0D0A) is also accepted.
  4. No embedded newlines: JSON string values must escape newlines as \n — a literal newline inside a JSON string would break line-based parsing.
  5. Trailing newline: Recommended. Elasticsearch requires it. Many tools produce it. Omitting the final \n is tolerated by most parsers.
  6. Empty lines: The NDJSON spec says parsers MAY silently ignore empty lines but MUST document this behavior. The JSON Lines spec does not address empty lines. In practice, most parsers skip them.
  7. No outer structure: No opening bracket, no closing bracket, no trailing comma. The file is not valid JSON as a whole — only each individual line is.

Elasticsearch: the adoption catalyst

Elasticsearch's bulk API (_bulk and _msearch endpoints) requires NDJSON as its request body format. Every bulk indexing request sends alternating lines of action metadata and document source:

{"index":{"_index":"logs"}}
{"timestamp":"2026-04-03T10:00:00Z","level":"error","message":"disk full"}
{"index":{"_index":"logs"}}
{"timestamp":"2026-04-03T10:00:01Z","level":"info","message":"recovered"}

The Content-Type header must be application/x-ndjson (or application/json), and the final line must end with \n. Elasticsearch chose NDJSON because it needs to parse only the action line on the coordinating node to route the document to the correct shard — no full-body JSON parsing is required at the routing stage. This design decision, made before the format had widespread recognition, exposed millions of developers to NDJSON through the Elastic Stack (Elasticsearch, Logstash, Kibana). The ELK stack remains one of the largest sources of NDJSON traffic on the internet.

MongoDB, BigQuery, and the cloud ecosystem

MongoDB's mongoexport tool writes one JSON document per line by default — NDJSON format. mongoimport reads it back. Extended JSON encoding preserves BSON types (ObjectId, Date, Decimal128) that standard JSON cannot represent, but the line-per-document structure follows NDJSON rules exactly.

Google BigQuery accepts what it calls "newline-delimited JSON (ndJSON)" for data loading from Cloud Storage, and its documentation states explicitly that this is "the same format as the JSON Lines format." Each JSON object occupies one line; gzip-compressed files are limited to 4 GB and cannot be read in parallel (uncompressed files load faster because BigQuery parallelizes across file splits).

AWS services produce and consume NDJSON across the stack: CloudWatch Logs stores structured log entries as JSON objects, Kinesis Data Firehose delivers NDJSON to S3, and Athena queries NDJSON files with standard SQL. Apache Spark's spark.read.json() method reads NDJSON by default — standard JSON arrays require the explicit multiLine option.

MIME type and identification

NDJSON has no IANA-registered MIME type. The x- prefix in application/x-ndjson marks it as unregistered. Three MIME types circulate in practice:

  • application/x-ndjson — used by Elasticsearch, most HTTP APIs, and the NDJSON spec
  • application/jsonl — suggested by the JSON Lines spec, not widely adopted in HTTP headers
  • application/json-seq — belongs to RFC 7464 (JSON Text Sequences), a different format that uses the ASCII Record Separator character (0x1E) as delimiter

Because NDJSON is plain text with no magic bytes, file identification relies entirely on the extension (.ndjson or .jsonl) or the Content-Type header. A file containing one JSON object per line but saved with a .json extension is ambiguous — tools cannot distinguish it from a standard JSON file without reading its contents.

Comparison with alternatives

NDJSON vs. JSON: A JSON array is a single valid document; NDJSON is a sequence of independent documents. JSON supports pretty-printing across lines; NDJSON requires each record on exactly one line. JSON is better for configuration files and API responses; NDJSON is better for logs, data feeds, and anything that grows over time.

NDJSON vs. CSV: CSV uses a fixed column header and stores flat tabular data. NDJSON supports nested objects, arrays, mixed types, and schema evolution (different fields on different lines). CSV is more compact for purely tabular data and universally supported by spreadsheet software. NDJSON preserves data types that CSV loses (numbers vs. strings, nulls vs. empty strings, nested structures).

NDJSON vs. Parquet: Parquet is a binary columnar format optimized for analytical queries — it compresses 5-10x better than NDJSON and supports predicate pushdown for fast column-specific reads. NDJSON is human-readable, append-friendly, and works with text tools. Parquet is better for data warehouse storage; NDJSON is better for data transit and logging.

Encoding and corruption

Both specifications require UTF-8. The most common corruption scenarios in NDJSON files:

  • Broken JSON on a line: A truncated write (crash during append) leaves an incomplete JSON object on the last line. Parsers that skip invalid lines recover; strict parsers halt.
  • Unescaped newlines in strings: A JSON string containing a literal newline instead of the escape sequence \n splits one record across two lines, breaking every downstream tool.
  • BOM insertion: Some Windows editors insert a UTF-8 BOM (0xEF 0xBB 0xBF) at the start of the file. JSON parsers that do not strip BOM will fail on the first line.
  • Mixed line endings: A file containing both \n and \r\n lines is valid per spec, but a poorly written parser that splits only on \n will leave trailing \r characters in parsed values.
  • Trailing commas: Developers familiar with JSON arrays sometimes add commas at the end of NDJSON lines. These produce invalid JSON per RFC 8259.

Validation is straightforward: pipe the file through jq empty file.ndjson — jq exits with a non-zero status and prints the line number of the first invalid JSON value.

.NDJSON compared to alternatives

.NDJSON compared to alternative formats
Formats Criteria Winner
.NDJSON vs .JSON
Streaming support
NDJSON processes one line at a time with constant memory. JSON arrays require parsing the entire structure — the closing bracket must be reached before the array is considered complete.
NDJSON wins
.NDJSON vs .JSON
Append operations
Appending a record to NDJSON is a single file-append syscall. Appending to a JSON array requires reading the file, finding the closing bracket, inserting before it, and rewriting.
NDJSON wins
.NDJSON vs .CSV
Schema flexibility
Each NDJSON line can contain different fields, nested objects, and arrays. CSV requires a fixed column header and cannot represent nested structures without flattening conventions.
NDJSON wins
.NDJSON vs .CSV
Spreadsheet compatibility
CSV opens natively in Excel, Google Sheets, and every spreadsheet application. NDJSON requires conversion to CSV or loading through a programming language or SQL tool like DuckDB.
CSV wins
.NDJSON vs .PARQUET
Compression ratio
Parquet's columnar storage with dictionary and run-length encoding compresses 5-10x smaller than NDJSON text. A 10 GB NDJSON log file may compress to 1-2 GB as Parquet.
PARQUET wins
.NDJSON vs .PARQUET
Human readability
NDJSON is plain UTF-8 text readable in any editor. Parquet is binary and requires specialized tools (DuckDB, pyarrow, parquet-tools) to inspect its contents.
NDJSON wins
.NDJSON vs .JSON LINES
Format compatibility
NDJSON (.ndjson) and JSON Lines (.jsonl) are the same format under different names. Rename a .jsonl file to .ndjson and every compliant parser processes it identically.
Draw

Technical reference

MIME Type
application/x-ndjson
Developer
Community standard
Year Introduced
2013
Open Standard
Yes — View specification

Binary Structure

NDJSON is a UTF-8 text format with no binary structure, no magic bytes, and no file header. Each line contains exactly one valid JSON value per RFC 8259, terminated by a newline character (0x0A). The newline is the sole record delimiter — no commas, brackets, or framing bytes exist between records. JSON string values must escape literal newlines as the two-character sequence \n; an unescaped newline inside a string would split the record across lines and break parsing. The file has no outer structure: no opening bracket, no closing bracket. An empty file is valid (zero records). A trailing newline after the last record is recommended by both specs and required by Elasticsearch's bulk API. The MIME type application/x-ndjson is the de facto standard, though it lacks formal IANA registration. The byte-order mark (BOM, U+FEFF) is explicitly prohibited by the JSON Lines specification.

2006Line-delimited JSON patterns appear in Unix log processing pipelines, predating any formal specification2013JSON Lines specification published at jsonlines.org, defining .jsonl extension and UTF-8/newline rules2014NDJSON specification published on GitHub (ndjson/ndjson-spec), defining .ndjson extension and application/x-ndjson MIME type; Elasticsearch adopts NDJSON for bulk API2014RFC 7464 published for JSON Text Sequences (application/json-seq), a related but distinct format using ASCII Record Separator delimiters2017Elasticsearch issues #25673 and #25718 formalize application/x-ndjson as the preferred Content-Type for bulk endpoints, deprecating plain application/json for NDJSON payloads2022DuckDB adds native NDJSON/JSON Lines reading via read_json_auto(), enabling SQL queries directly on NDJSON files without import2023OpenAI adopts JSONL as the required format for fine-tuning training data uploads, bringing NDJSON/JSON Lines to the machine learning mainstream2024Observability platforms (Datadog, Grafana Loki, Fluentd) standardize on NDJSON as the structured log interchange format across vendor boundaries
Pretty-print every record in an NDJSON file other
jq '.' input.ndjson

jq processes each line as a separate JSON value and outputs it with indentation. No flags needed — jq natively handles NDJSON input by treating each line independently.

Count records in an NDJSON file other
wc -l data.ndjson

Each NDJSON record occupies exactly one line, so the line count equals the record count. Assumes the file ends with a trailing newline (standard convention).

Filter NDJSON lines by field value other
jq 'select(.level == "error")' logs.ndjson

Outputs only lines where the 'level' field equals 'error'. Each line is evaluated independently — no memory accumulation. Useful for log analysis on multi-GB files.

Convert NDJSON to a JSON array other
jq -s '.' input.ndjson > output.json

The --slurp flag reads all lines into memory and wraps them in a JSON array. Output is a single valid JSON document. Caution: loads entire file into RAM.

Query NDJSON with SQL using DuckDB other
duckdb -c "SELECT level, count(*) FROM 'logs.ndjson' GROUP BY level;"

DuckDB auto-detects NDJSON format and executes SQL queries directly on the file without import. Handles out-of-core processing for files larger than available memory.

LOW

Attack Vectors

  • JSON parsing bombs: deeply nested objects or extremely long string values on a single line can exhaust parser memory or stack depth, causing denial-of-service in consuming applications
  • Injection via unescaped content: NDJSON lines containing unescaped control characters or crafted Unicode sequences can confuse downstream parsers that concatenate lines into secondary formats (SQL, HTML, shell commands)
  • Memory exhaustion: a single NDJSON line containing a multi-GB JSON object has no format-level size limit, and parsers without line-length guards will attempt to load it entirely into memory
  • Data exfiltration via oversized fields: log pipelines that ingest NDJSON without field-size validation can be abused to smuggle large payloads through logging infrastructure

Mitigation: FileDex does not parse, execute, or process NDJSON content server-side. This page provides format reference only. No user-uploaded data is evaluated as JSON.

jq tool
Command-line JSON processor that handles NDJSON natively — each input line is processed as an independent JSON value without any special flags
DuckDB tool
Analytical SQL engine with native NDJSON reading via read_json_auto() and direct SELECT * FROM 'file.ndjson' syntax, supporting out-of-core processing
Elasticsearch service
Search and analytics engine that requires NDJSON as the request body format for its _bulk and _msearch APIs — the largest single driver of NDJSON adoption
mongoexport produces NDJSON by default (one document per line) and mongoimport consumes it, using Extended JSON to preserve BSON types
pandas library
Python data analysis library that reads NDJSON with pd.read_json(path, lines=True) and writes it with df.to_json(path, orient='records', lines=True)
ndjson (npm) library
Node.js streaming parser and serializer for NDJSON, designed for Transform stream pipelines