apps/www/_blog/2025-05-29-building-on-open-table-formats.mdx
Open table formats are specifications that define how to store and manage large datasets in a structured manner on distributed storage systems. They provide a layer of abstraction over raw data files, enabling features such as ACID transactions, schema evolution, and time travel. This abstraction allows multiple processing engines to interact with the data consistently and reliably.
The primary open table formats in use today are Apache Iceberg, Delta Lake, and Apache Hudi. Each offers unique capabilities tailored to specific use cases:
These three open table formats emerged to solve distinct challenges. Iceberg shines in analytics scenarios where you need consistency, flexibility, and compatibility with many data engines. Delta Lake is best when you are in a Spark environment (e.g., Databricks.) And Hudi is good for streaming-centric environments and database change data capture (CDC).
Iceberg was designed to solve the challenges of managing large, analytical databases stored in object storage systems like Supabase Storage, Amazon S3, Google Cloud Storage, and Azure Blob Storage. Iceberg brings database-like capabilities to distributed file systems, enabling reliable, consistent access to data that would otherwise be locked in raw files.
Open table formats define how data and metadata are organized. They sit on top of Parquet files (or other files with data) and they make large datasets queryable across multiple engines without sacrificing consistency or performance. Iceberg’s design allows multiple systems to write to and query the same dataset safely. This makes it an essential component for modern data platforms that need to scale.
Iceberg offers several key features:
Iceberg addresses several trends prevalent in our industry today, turning raw object storage into a usable, consistent, and vendor-neutral data layer:
Moreover, teams do not want to be locked in to proprietary systems. Data is meant to be free, not stored in formats that provide artificial advantages and perverse incentives to companies. The idea of the lakehouse (combining the scalability and cost efficiency of data lakes with the transactional guarantees of data warehouses) remains popular, while Iceberg makes it feasible for data to be free as companies compete on compute engines.
It takes two to tango. The combination of Iceberg and Amazon S3 is a potent alternative to traditional proprietary data lakes and data warehouses. It’s thanks to the significant evolution of Amazon S3 that much of Iceberg’s promise has come to fruition:
The ETL industry was built around a fundamental problem: moving data from one system to another, transforming it along the way to make it usable for different purposes. For decades, this meant extracting data from operational databases, cleaning and reshaping it through a series of batch processes, and loading it into a data warehouse for analysis. This pipeline was slow, fragile, and expensive. However, it was necessary, because storage and compute were tightly coupled, and operational systems could not support large-scale analytics directly.
What’s happening now with S3 + Iceberg is a paradigm shift. Open table formats like Iceberg turn object storage into a queryable, versioned, and structured data layer. At the same time, innovations like S3 Express, Conditional Writes, and S3 Tables make it possible to write directly into object storage at scale with transactional guarantees and low latency.
This means the traditional ETL model—extract, transform, and load—starts to break down. Instead of lifting data out of one system, transforming it, and depositing it into another, teams can write once into Iceberg tables on S3 and access the same dataset across multiple engines. Transformation can happen in place, not as a separate pipeline. The data is already where it needs to be.
Supabase has always been more than just a Postgres host. We are the platform for building modern applications. Supabase starts with a Postgres database and includes products for Authentication, Storage, Edge Functions, Realtime, Vectors, and more. As the industry moves toward open table formats like Iceberg and S3 as the default storage layer, Supabase’s role evolves with it to be less about a database, and more about data.
Postgres remains the core of the Supabase platform: the system of record for operational data. As we’ve written, we are also building first-class support for OpenTelemetry across our services, enabling developers to collect observability data (logs, metrics, and traces) without managing additional infrastructure. And with Supabase ETL, we will provide a lightweight, Postgres-native way to move data into S3 and more, where it can be queried at scale using Iceberg and your choice of analytics engines.
Our goal is to make Supabase the developer’s data cloud: Postgres for transactions, OpenTelemetry for observability, and Iceberg for analytics, all connected by simple, open tools. To do so, we remain focused on what developers need: a backend that starts simple, grows with your product, and keeps your data open and portable at every stage.