Question 1

What is an Avro file?

Accepted Answer

Apache Avro is a row-based data serialization system that stores data alongside its schema in JSON format. Developed within the Hadoop ecosystem, Avro supports schema evolution (adding/removing fields without breaking readers), compact binary encoding, and RPC. It is widely used in streaming data platforms.

Question 2

How to open Avro files

Accepted Answer

- **avro-tools** — `java -jar avro-tools.jar tojson file.avro` - **Python fastavro** — `pip install fastavro` for reading - **Apache Spark** — Distributed processing - **Avro Viewer** (VS Code extension) — Visual inspection

Question 3

Technical specifications

Accepted Answer

| Property | Value | |----------|-------| | Storage | Row-based | | Encoding | Binary or JSON | | Schema | JSON (embedded in file header) | | Schema Evolution | Forward and backward compatible | | Compression | Snappy, Deflate, Bzip2, Zstd |

Question 4

Common use cases

Accepted Answer

- **Apache Kafka**: Default serialization format for messages. - **Data pipelines**: Hadoop and Spark data processing. - **Schema registry**: Confluent Schema Registry integration. - **Event sourcing**: Serializing domain events.

Property	Value
Storage	Row-based
Encoding	Binary or JSON
Schema	JSON (embedded in file header)
Schema Evolution	Forward and backward compatible
Compression	Snappy, Deflate, Bzip2, Zstd