File
Apache Avro Apache Hudi Apache Iceberg Apache ORC Apache Parquet CSV Delta Lake
Attribute | Apache Avro | Apache Hudi | Apache Iceberg | Apache ORC | Apache Parquet | CSV | Delta Lake |
---|---|---|---|---|---|---|---|
Name | Apache Avro | Apache Hudi | Apache Iceberg | Apache ORC | Apache Parquet | CSV | Delta Lake |
Description | Apache Avro is the leading serialization format for record data, and first choice for streaming data pipelines. | Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Utilises data stored in either parquet or orc. | Iceberg is a high-performance format for huge analytic tables. Utilises data stored in either parquet, avro, or orc. | ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. | Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. | Comma-Separated Values (CSV) is a text file format that uses commas to separate values in plain text. | Delta Lake is an open-source storage framework that enables building a Lakehouse architecture. |
License | Apache license 2.0 | Apache license 2.0 | Apache license 2.0 | Apache license 2.0 | Apache license 2.0 | N/A | Apache license 2.0 |
Source code | https://github.com/apache/avro | https://github.com/apache/hudi | https://github.com/apache/iceberg | https://github.com/apache/orc | https://github.com/apache/parquet-format | https://github.com/delta-io/delta | |
Website | https://avro.apache.org/ | https://hudi.apache.org/ | https://iceberg.apache.org/ | https://orc.apache.org/ | https://parquet.apache.org/ | https://www.rfc-editor.org/rfc/rfc4180.html | https://delta.io/ |
Year created | 2009 | 2016 | 2017 | 2013 | 2013 | 0 | 2019 |
Company | Apache | Uber | Netflix | Hortonworks, Facebook | Twitter, Cloudera | Databricks | |
Language support | java, c++, c#, c, python, javascript, perl, ruby, php, rust | java, scala, c++, python | java, scala, c++, python, r, php | java, scala, c++, python, r, php, go | scala, java, python, rust | ||
Use cases | Stream processing, Analytics, Efficient data exchange | Incremental data processing, Data upserts, Change Data Capture (CDC), ACID transactions | Write once read many, Analytics, Efficient storage, ACID transactions | Write once read many, Analytics, Efficient storage, ACID transactions | Write once read many, Analytics, Efficient storage, Column based queries | Write once read many, Analytics, Efficient storage, ACID transactions | |
Is human readable | |||||||
Orientation | row | column or row | column or row | row | column | row | column |
Has type system | |||||||
Has nested structure support | |||||||
Has native compression | |||||||
Has encoding support | |||||||
Has constraint support | |||||||
Has acid support | |||||||
Has metadata | |||||||
Has encryption support | |||||||
Data processing framework support | Apache Flink, Apache Gobblin, Apache NiFi, Apache Pig, Apache Spark, | Apache Spark, Apache Flink, | Apache Drill, Apache Flink, Apache Gobblin, Apache Pig, Apache Spark, | Apache Flink, Apache Gobblin, Apache Hadoop, Apache NiFi, Apache Pig, Apache Spark, | Apache Beam, Apache Drill, Apache Flink, Apache Spark, | Apache Beam, Apache Drill, Apache Flink, Apache Gobblin, Apache Hive, Apache NiFi, Apache Pig, Apache Spark, | Apache Drill, Apache Flink, Apache Spark, |
Analytics query support | Apache Impala, Apache Druid, Apache Hive, Apache Pinot, AWS Athena, BigQuery, Clickhouse, Firebolt, | Apache Hive, Apache Impala, AWS Athena, BigQuery, Clickhouse, Presto, Trino, | Apache Impala, Apache Druid, Apache Hive, AWS Athena, BigQuery, Clickhouse, Dremio, DuckDB, Presto, Trino, | Apache Impala, Apache Druid, Apache Hive, Apache Pinot, AWS Athena, BigQuery, Clickhouse, Firebolt, Presto, Trino, | Apache Hive, Apache Impala, Apache Druid, Apache Pinot, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, DuckDB, Firebolt, | Apache Impala, Apache Druid, Apache Pinot, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, DuckDB, Firebolt, | Apache Hive, AWS Athena, Azure Synapse, BigQuery, Clickhouse, Dremio, Presto, Trino, |