Name
Apache Hudi
Apache ORC
Description
Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Utilises data stored in either parquet or orc.
ORC is a self-describing type-aware columnar file format designed for Hadoop workloads.
Source code
https://github.com/apache/hudi
https://github.com/apache/orc
Website
https://hudi.apache.org/
https://orc.apache.org/
License
Apache license 2.0
Apache license 2.0
Year created
2016
2013
Company
Uber
Hortonworks, Facebook
Use cases
Incremental data processing, Data upserts, Change Data Capture (CDC), ACID transactions
Write once read many, Analytics, Efficient storage, ACID transactions
Language support
java, scala, c++, python
Is human readable
no
no
Orientation
column or row
row
Has type system
yes
yes
Has nested structure support
yes
yes
Has native compression
yes
yes
Has encoding support
yes
yes
Has constraint support
yes
no
Has acid support
yes
no
Has metadata
yes
yes
Has encryption support
maybe
yes
Data processing framework support
Apache Spark,
Apache Flink,
Apache Flink,
Apache Gobblin,
Apache Hadoop,
Apache NiFi,
Apache Pig,
Apache Spark,
Analytics query support
Apache Hive,
Apache Impala,
AWS Athena,
BigQuery,
Clickhouse,
Presto,
Trino,
Apache Impala,
Apache Druid,
Apache Hive,
Apache Pinot,
AWS Athena,
BigQuery,
Clickhouse,
Firebolt,
Presto,
Trino,