I do really impressed with Apache Arrow. It’s a really game changer.
Currently we experimenting with Dremio on AWS. It’s very promising. Currently comparing the cost from previous architecture.
We use this stack basically Parquet, Dremio/Arrow and Hudi on top of S3 with Spark as Compute.
Is Impala + Kudu still relevant today? I know these stuff from my experience in Banking before with Cloudera.
And why in previous talk you don’t mention Hudi? How that’s compare with Iceberg and Delta Lake from Databricks?