Name |
Apache Parquet |
CSV |
Description |
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. |
Comma-Separated Values (CSV) is a text file format that uses commas to separate values in plain text. |
License |
Apache license 2.0 |
N/A |
Source code |
https://github.com/apache/parquet-format |
|
Website |
https://parquet.apache.org/ |
https://www.rfc-editor.org/rfc/rfc4180.html |
Year created |
2013 |
0 |
Company |
Twitter, Cloudera |
|
Language support |
java, scala, c++, python, r, php |
java, scala, c++, python, r, php, go |
Use cases |
Write once read many, Analytics, Efficient storage, Column based queries |
|
Is human readable |
no
|
yes
|
Orientation |
column |
row |
Has type system |
yes
|
no
|
Has nested structure support |
yes
|
no
|
Has native compression |
yes
|
no
|
Has encoding support |
yes
|
no
|
Has constraint support |
no
|
no
|
Has acid support |
no
|
no
|
Has metadata |
yes
|
no
|
Has encryption support |
yes
|
no
|
Data processing framework support |
Apache Beam,
Apache Drill,
Apache Flink,
Apache Spark,
|
Apache Beam,
Apache Drill,
Apache Flink,
Apache Gobblin,
Apache Hive,
Apache NiFi,
Apache Pig,
Apache Spark,
|
Analytics query support |
Apache Hive,
Apache Impala,
Apache Druid,
Apache Pinot,
AWS Athena,
Azure Synapse,
BigQuery,
Clickhouse,
Dremio,
DuckDB,
Firebolt,
|
Apache Impala,
Apache Druid,
Apache Pinot,
AWS Athena,
Azure Synapse,
BigQuery,
Clickhouse,
Dremio,
DuckDB,
Firebolt,
|