┌─ FILE ANALYSIS ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ┐
│ DEVELOPER : Apache Software Foundation
│ CATEGORY : Data
│ MIME TYPE : application/avro
│ MAGIC BYTES : 4F626A01
└ ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ┘
What is an Avro file?
Apache Avro is a row-based data serialization system that stores data alongside its schema in JSON format. Developed within the Hadoop ecosystem, Avro supports schema evolution (adding/removing fields without breaking readers), compact binary encoding, and RPC. It is widely used in streaming data platforms.
How to open Avro files
- avro-tools —
java -jar avro-tools.jar tojson file.avro - Python fastavro —
pip install fastavrofor reading - Apache Spark — Distributed processing
- Avro Viewer (VS Code extension) — Visual inspection
Technical specifications
| Property | Value |
|---|---|
| Storage | Row-based |
| Encoding | Binary or JSON |
| Schema | JSON (embedded in file header) |
| Schema Evolution | Forward and backward compatible |
| Compression | Snappy, Deflate, Bzip2, Zstd |
Common use cases
- Apache Kafka: Default serialization format for messages.
- Data pipelines: Hadoop and Spark data processing.
- Schema registry: Confluent Schema Registry integration.
- Event sourcing: Serializing domain events.