doc/developer/design/20251015_builtin_schema_migration.md
Materialize has a number of builtin storage collections (tables and sources), defined in the code. Over time, the schema of these collections can change, as we add new features or remove cruft. When a schema change happens, existing collections created by previous Materialize versions need to be migrated to the new schema.
The existing mechanism for this, called "builtin item migration", is flawed. When invoked in read-only mode, it can make changes to durable state that poison said state for any processes at previous versions. As a result, after an in-progress upgrade to some version Y was aborted, it is not always guaranteed that a subsequent upgrade to a version X < Y is still possible. This is a surprising and annoying limitation, especially for our self-managed offering where we don't control which upgrade paths users choose.
Another issue with the current builtin item migration mechanism is that it has no controls or safeguards. It automatically either evolves the schema of persist shards, or replaces them, depending on whether or not a schema change is backward compatible. This introduces the risk of accidentally losing data by making an incompatible schema change that causes the replacement of a builtin collection shard. We would like these migrations to be performed in a more intentional fashion, so that they don't happen by accident.
These criteria are required to make builtin schema migration robust in self-managed deployments:
Additionally, in anticipation of the use-case-isolation work:
The existing migration mechanism supports two approaches to schema migration:
Both approaches have their merits, the first allowing schema changes without data loss, the second allowing breaking schema changes (at the cost of data loss). They are, however, problematic in how they are implemented:
The proposed solution keeps both migration approaches but avoids the problematic implementation details.
A read-only process performing builtin schema migration by schema evolution skips registering the new schema with the persist shard. Instead, it only performs a sanity check ensuring that the new schema is indeed backward compatible with the current shard schema. Doing so requires no writes to durable state, and therefore doesn't interfere with other processes in any way.
In the subsequent read-only bootstrap phase, the process creates persist read and write handles using the new schema. Read handles perform transparent migration of any data updates that flow through them, so dataflow hydration can proceed using the new schema. Write handles only require a matching registered shard schema when writing batches, which is something a read-only process doesn't do.
Once the read-only process gets promoted to a leader, and runs the builtin schema migration mechanism again, it this time registers the new schema with the persist shard. Doing so fences out any processes that planned to evolve the schema to an earlier version. It does not interfere with processes planning to evolve the schema to a later version, assuming that the later version's schema is backward compatible with all previous schemas.
The general approach to shard replacement matches the existing implementation, but takes care to not needlessly interfere with processes at other versions.
To support schema migration by shard replacement, we need a place to store new shard IDs across restarts, and we will keep using the migration shard for this.
The migration shard contains entries of the form (GlobalId, Version, DeployGeneration) -> ShardId.
A read-only process performing shard replacement reads the migration shard to collect all existing entries at its version and deploy generation. Notably, it ignores any entries at different versions or deploy generations, to avoid any interference with concurrent in-progress upgrades. Depending on the existing migration shard entries, the process either decides to use the existing replacement shard, or to create the replacement shard and write its ID into the migration shard, at the current version. It sets the new shard ID as the migrated collection's shard in its in-memory catalog and commences bootstrapping using the replacement shard.
A leader process performing shard replacement performs the same steps as in read-only mode. Additionally, it cleans up durable state written by earlier versions and/or deploy generations by:
Note that the deploy generation is included in the migration shard key to support the scenario of two different upgrades to the same version running at the same time. Without the deploy generation in the key, these upgrades would interfere in unintended ways. While the scenario is not one we expect to occur during normal operation, we cannot rule out that it will ever occur in a self-managed context.
To prepare for use-case-isolation, which requires running multiple adapter processes concurrently, we need to make sure that builtin schema migration doesn't fail when processes of the same version race to perform it. We achieve this by reacting to the various compare-and-append failures by retrying the migration process.
During schema evolution, if the compare_and_evolve_schema call fails, the process compares the expected schema with its new schema.
compare_and_evolve_schema again.During shard replacement, if the compare_and_append call on the migration shard fails, the process restarts the shard replacement mechanism, re-reading the migration shard and acting according to its contents.
Note that here we do nothing to fence out older versions when a new leader has taken over.
We instead expect either one of the catalog fencing mechanisms or the upgrade orchestration to eventually take care of that.
To avoid data loss and other surprises caused by automatic builtin schema migrations, we introduce the concept of explicit migration instructions. A migration instruction instructs the process which builtin collection to migrate at which version, and which mechanism to use.
Migration instructions are kept in a hard-coded list:
const MIGRATIONS: &[(Version, SystemObjectDescription, Mechanism)] = &[
(
V123,
SystemObjectDescription {
schema_name: "mz_catalog".into(),
object_name: "mz_tables".into(),
object_type: CatalogItemType::Table,
},
Mechanism::Evolution,
),
...
];
enum Mechanism {
Evolution,
Replacement,
}
A migration from version X to version Y involves the following steps:
MIGRATIONS list all entries with versions > X and <= Y.Mechanism values:
Mechanism::Replacement, select shard replacement.Note that merging Evolution migrations like this is sound because persist requires that each shard schema is backward compatible with any previous schema registered with the shard (i.e. schema compatibility is transitive).
Which means schema evolution will succeed even if we skip intermediary schemas.
We safeguard against schema changes that don't have an associated MIGRATIONS entry by using the existing builtin fingerprint mechanism.
Each builtin collection has a fingerprint, derived from its schema definition, stored in the catalog.
By comparing the stored fingerprint with the fingerprint of the object definition in the code, we can detect schema changes that require migration.
If we detect a schema change but no corresponding MIGRATIONS entry, we immediately abort the process.
For the migration instructions scheme to be sound, we require that later versions do not remove migrations added in earlier versions (unless they are before the last unskippable version). The scenario where this is most likely to come up is when a migration gets backported into a patch release that isn't present in the already existing next minor release. For example, consider the case there both v0.10.0 and v0.11.0 have been released, and a new v0.10.1 is cut with a new builtin scheme migration. In that scenario, upgrading to v0.11.0 through v0.10.1 will fail because v0.11.0 expects a schema different from the one produced by the new migration in v0.10.1. To ensure this doesn't happen, we must disallow backporting of builtin schema migrations into prior releases. We can make sure we catch instances of such disallowed backports by running exhaustive upgrade tests: Whenever a new patch release is introduced, we test that upgrades from that release to all existing later releases succeed. In the example, we would test the upgrade from v0.10.1 to v0.11.0 (and any later versions), which would alert us about the issue.
As an alternative to schema evolution, we can consider a migration scheme that creates a new shard and copies over all existing data from the old shard, performing the migration in the process. Doing so would enable us to perform arbitrary rewrites of the data, as well as breaking schema changes without loss of historical data. This approach would be very powerful, but isn't straightforward to implement. In particular, it is not sufficient to perform the copy-and-migrate step once in read-only mode, since the leader environment might write more data to the old shard before the cutover. After the cutover the new leader has to make sure to copy over the missed updates, but it is not clear how to do that without negatively affecting the environment startup time. Persist schema evolution is easier to use and sufficiently powerful for now.
As an alternative to shard replacement, we could require that all schema changes to builtin collections need to be backward compatible and therefore eligible for schema evolution.
This seems like an unpractical constraint though, given that during development of new database features we often end up making breaking changes to mz_internal relations in response to feedback received.