protocol_rfcs/materialize-partition-columns.md
Associated Github issue for discussions: https://github.com/delta-io/delta/issues/5555
Currently, Delta tables store partition column values primarily in the table metadata (specifically in the partitionValues field of AddFile actions), and by default these columns are not physically written into the Parquet data files themselves.
This RFC proposes a new writer-only table feature called materializePartitionColumns. When supported, this feature requires partition columns to be physically materialized in Parquet data files alongside the data columns.
This feature provides a mechanism to require partition column materialization at the protocol level, ensuring all writers to the table comply with this requirement during the period when the feature is supported.
Materializing partition columns enhances compatibility with Parquet readers that access Parquet files directly and do not interpret Delta’s AddFile metadata, as well as with Iceberg readers, which expect partition columns to be stored within the data files.
Additionally, having partition information embedded in the data files themselves enables more flexible data reorganization strategies. The same parquet files could be linked in future versions of a table that do not have the same (or any) partition columns.
New Section after Identity Columns section
When this feature is supported, partition columns are physically written to Parquet files alongside the data columns. To support this feature:
materializePartitionColumns must exist in the table protocol's writerFeatures.When supported:
materializePartitionColumns is set in the protocol, writers must materialize partition columns into any newly created data file, placing them after the data columns in the parquet
schema. This mimics the same partition column materialization requirement from IcebergCompatV1
and
IcebergCompatV2. As such, the materializePartitionColumns feature can be seen as a subset of the requirements imposed by those features, providing the partition column materialization guarantee independently without requiring full
Iceberg compatibility.materializePartitionColumns is not set in the table protocol, writers are not required to write partition columns to data files. Note that other features might still require materialization of partition values, such as IcebergCompatV1This feature does not impose any requirements on readers. All Delta readers must be able to read the table regardless of whether partition columns are materialized in the data files. If partition values are present in both parquet and AddFile metadata, Delta readers should continue to read partition values from AddFile metadata.