.avro

Apache Avro

Apache Avro is a row-based data serialization format that embeds its JSON schema in the file header. It supports schema evolution, compact binary encoding, and is the default format for Apache Kafka. This is a reference page only.

Learn more ↓

Data layout

Header schema

Records structured data

Row-BasedBinarySchema EvolutionApache Hadoop2009

By FileDex

Not convertible

Binary data serialization format. Schema-dependent conversion requires runtime deserialization not available in browser.

Common questions

What is an Avro file?

An Avro file is a binary data container that stores records in row-based format alongside a JSON schema in the file header. It supports schema evolution, meaning you can add or remove fields without breaking existing readers. Avro is widely used in Apache Kafka and Hadoop pipelines.

How do I open an Avro file?

Use avro-tools (java -jar avro-tools.jar tojson file.avro) to convert to readable JSON, or Python's fastavro library to read programmatically. VS Code with the Avro Viewer extension can display Avro contents visually.

What is the difference between Avro and Parquet?

Avro is row-based (good for streaming, writing, and full-record access), while Parquet is columnar (good for analytical queries that read few columns from many rows). Avro embeds its schema in the file; Parquet stores schema in footer metadata. Use Avro for Kafka messages and data ingestion, Parquet for data warehouse queries.

Can I convert Avro to CSV?

Yes, but with limitations. Avro supports nested records, arrays, maps, and union types that CSV cannot represent. Flat Avro records convert cleanly; nested structures require manual flattening. Use fastavro in Python or Apache Spark for the conversion.

What makes .AVRO special

What is an Avro file?

Apache Avro is a row-based data serialization system that stores data alongside its schema in JSON format. Developed within the Hadoop ecosystem, Avro supports schema evolution (adding/removing fields without breaking readers), compact binary encoding, and RPC. It is widely used in streaming data platforms.

Continue reading — full technical deep dive

How to open Avro files

avro-tools — java -jar avro-tools.jar tojson file.avro
Python fastavro — pip install fastavro for reading
Apache Spark — Distributed processing
Avro Viewer (VS Code extension) — Visual inspection

Technical specifications

Property	Value
Storage	Row-based
Encoding	Binary or JSON
Schema	JSON (embedded in file header)
Schema Evolution	Forward and backward compatible
Compression	Snappy, Deflate, Bzip2, Zstd

Common use cases

Apache Kafka: Default serialization format for messages.
Data pipelines: Hadoop and Spark data processing.
Schema registry: Confluent Schema Registry integration.
Event sourcing: Serializing domain events.

.AVRO compared to alternatives

.AVRO compared to alternative formats
Formats	Criteria	Winner
.AVRO vs .PARQUET	Query performance on analytical workloads Parquet's columnar storage allows reading only needed columns, while Avro must deserialize entire rows. Parquet is 5-50x faster for SELECT-few-columns queries.	PARQUET wins
.AVRO vs .PARQUET	Write speed and streaming suitability Avro's row-based layout allows appending individual records efficiently. Parquet requires buffering entire row groups before writing, making it unsuitable for streaming.	AVRO wins
.AVRO vs .PROTOCOL BUFFERS	Schema evolution flexibility Avro supports full schema evolution (add/remove/rename fields) with reader and writer schemas resolved at runtime. Protobuf requires field numbers and is less flexible with renames.	AVRO wins
.AVRO vs .JSON	File size and parsing speed Avro binary encoding is 5-10x smaller than equivalent JSON and parses faster because field names are not repeated in every record — the schema is stored once in the header.	AVRO wins

Technical reference

Specs CLI Conversions Security Ecosystem

MIME Type: application/avro
Magic Bytes: 4F 62 6A 01 Obj followed by version 01.
Developer: Apache Software Foundation
Year Introduced: 2009
Open Standard: Yes

000000004F626A01 Obj.

Obj followed by version 01.

Binary Structure

Avro files begin with a 4-byte magic sequence (4F 62 6A 01 — ASCII 'Obj' followed by version 0x01), followed by a file header containing the schema as a JSON string and a sync marker (16-byte random token). Data is stored in blocks — each block has a count of objects, the byte size of serialized data, the compressed data bytes, and a copy of the 16-byte sync marker. The sync marker allows readers to recover from corruption by scanning forward to the next valid block boundary. Blocks can be independently compressed using null, deflate, snappy, zstd, or bzip2 codecs.

Offset	Length	Field	Example	Description
`0x00`	`4 bytes`	Magic	`4F 62 6A 01`	ASCII 'Obj' + version byte 0x01 — identifies file as Avro Object Container.
`0x04`	`variable`	File metadata	`(map of string->bytes)`	Avro map containing 'avro.schema' (JSON string) and 'avro.codec' (compression codec name). Encoded as Avro long-prefixed key-value pairs.
`variable`	`16 bytes`	Sync marker	`(random 16-byte token)`	Randomly generated sync marker unique to this file. Repeated at the end of every data block for block boundary detection and corruption recovery.

2009Apache Avro created by Doug Cutting as part of the Hadoop ecosystem for efficient data serialization2011Avro becomes a top-level Apache project with stable 1.x specification2014Confluent Schema Registry launched, using Avro as the primary schema format for Kafka2018Avro 1.9 adds Zstandard (zstd) compression codec support for higher compression ratios2023Avro 1.11.3 released with improved logical type support and Java 17 compatibility

Convert Avro to JSON other

java -jar avro-tools.jar tojson data.avro

Deserializes Avro binary records and prints them as one JSON object per line. Requires the avro-tools JAR from the Apache Avro distribution.

Read Avro file with Python fastavro other

python -c "import fastavro; [print(r) for r in fastavro.reader(open('data.avro','rb'))]"

Uses the fastavro library (pip install fastavro) to read and print all records from an Avro file. Much faster than the official avro Python library.

Get Avro file schema other

java -jar avro-tools.jar getschema data.avro

Extracts and prints the JSON schema embedded in the Avro file header without reading any data blocks.

Convert JSON to Avro other

java -jar avro-tools.jar fromjson --schema-file schema.avsc data.json > data.avro

Encodes JSON records into Avro binary format using the specified schema file (.avsc). Each line of the JSON input becomes one Avro record.

AVRO → JSON export lossless Avro binary data can be exported to human-readable JSON using avro-tools or fastavro. This is the most common conversion for inspection and debugging.

AVRO → PARQUET transcode lossless Parquet is columnar while Avro is row-based. Converting Avro to Parquet enables faster analytical queries on large datasets at the cost of slower row-level access.

AVRO → CSV export lossy CSV export flattens Avro records for use in spreadsheets and simple data tools. Nested fields and complex types are lost or require manual flattening.

LOW

Attack Vectors

Maliciously crafted schema JSON in the file header could exploit JSON parser vulnerabilities
Extremely large block sizes declared in header could cause out-of-memory conditions during deserialization

Mitigation: FileDex does not open, execute, or parse these files. Reference page only.

Apache Kafka service

Distributed streaming platform that uses Avro as default message serialization format

fastavro library

Fast Avro serialization and deserialization library for Python

Confluent Schema Registry service

Central schema management for Kafka that stores and validates Avro schemas

Apache Spark tool

Distributed compute engine with native Avro read/write support via spark-avro

avro-tools tool

Official CLI for inspecting, converting, and validating Avro files

Apache Avro

Common questions

What makes .AVRO special

What is an Avro file?

How to open Avro files

Technical specifications

Common use cases

.AVRO compared to alternatives

Related Formats

Technical reference

Binary Structure