Name
Apache Parquet
Apache Hudi
Description
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Utilises data stored in either parquet or orc.
License
Apache license 2.0
Apache license 2.0
Source code
https://github.com/apache/parquet-format
https://github.com/apache/hudi
Website
https://parquet.apache.org/
https://hudi.apache.org/
Year created
2013
2016
Company
Twitter, Cloudera
Uber
Language support
java, scala, c++, python, r, php
Use cases
Write once read many, Analytics, Efficient storage, Column based queries
Incremental data processing, Data upserts, Change Data Capture (CDC), ACID transactions
Is human readable
no
no
Orientation
column
column or row
Has type system
yes
yes
Has nested structure support
yes
yes
Has native compression
yes
yes
Has encoding support
yes
yes
Has constraint support
no
yes
Has acid support
no
yes
Has metadata
yes
yes
Has encryption support
yes
maybe
Data processing framework support
Apache Beam,
Apache Drill,
Apache Flink,
Apache Spark,
Apache Spark,
Apache Flink,
Analytics query support
Apache Hive,
Apache Impala,
Apache Druid,
Apache Pinot,
AWS Athena,
Azure Synapse,
BigQuery,
Clickhouse,
Dremio,
DuckDB,
Firebolt,
Apache Hive,
Apache Impala,
AWS Athena,
BigQuery,
Clickhouse,
Presto,
Trino,