doc/developer/design/20260209_simplified_rollout_triggers_and_crd.md
Manual rollout triggering
Users must manually set requestRollout to a new UUID value to trigger a rollout, even when they've already made meaningful changes to the spec.
Additionally, the current system is difficult to automate when faced with eviction from nodes.
Automatic rollout detection: The system should automatically detect when a rollout is needed based on spec changes, without requiring users to manually set a UUID.
Seamless version migration: Existing v1alpha1 resources should continue to work, with automatic conversion to v1alpha2 as needed.
Terraform compatibility: Configuration must not fight with infrastructure as code tools such as Terraform.
SaaS compatibility: Updating orchestratord should not trigger a rollout if the Materialize CR is unchanged.
Does not prevent implementation of node eviction handling: While it is not required to have automation to handle node eviction, the implementation should not prevent adding support for node eviction later, without requiring additional major CRD changes.
Introduce a new v1alpha2 version of the Materialize CRD with the following changes:
Spec changes:
requestRollout (Uuid) - Rollouts are now triggered automatically when the spec hash changes.inPlaceRollout (bool) - This has been deprecated/ignored for a while, and is replaced with MaterializeRolloutStrategy::ImmediatelyPromoteCausingDowntime.environmentdIamRoleArn (Option<String>) - This has been deprecated for a while, and is replaced with setting "eks.amazonaws.com/role-arn" in serviceAccountAnnotations instead. The conversion webhook should move this if it exists, with any conflicting value already present in serviceAccountAnnotations taking precedence.forcePromote from Uuid to Option<String> - Instead of triggering promotion when matching the UUID of requestRollout, it triggers promotion when matching the hash stored in status.requestedRolloutHash.Status changes:
lastCompletedRolloutRequest (Uuid) with lastCompletedRolloutHash (Option<String>) - Stores the spec hash of the last successful rollout. Will be None if first deploying.resourcesHash (String) with requestedRolloutHash (Option<String>) - Stores the spec hash of the currently requested rollout. Will be None when migrating from v1alpha1 while already in mid-rollout and in "promoting" status.Important Note!!!
We must not update requestedRolloutHash if a rollout has reached the "promoting" state. At this point we have committed to promoting the current rollout, and do not want to trigger another one until it is complete. After the existing rollout has successfully promoted, another reconciliation will be triggered at which point we will update the requestedRolloutHash and trigger a new rollout.
A new generate_rollout_hash() method computes a SHA256 hash of the spec fields that affect rollouts:
pub fn generate_rollout_hash(&self) -> String {
let mut hasher = Sha256::new();
// Hash only fields that should trigger a rollout.
// Excludes: balancerd/console resources, forcePromote, certificates
// Exclusions are omitted here for brevity.
let mut value = serde_json::to_value(&spec).unwrap();
value.sort_all_objects();
hasher.update(&serde_json::to_vec(&value).unwrap());
// Include force_rollout annotation for manual triggers.
// This is future planning so we can trigger rollouts without conflicting
// with terraform-managed fields.
if let Some(annotation) = self.metadata.annotations
.and_then(|a| a.get(FORCE_ROLLOUT_ANNOTATION)) {
hasher.update(annotation);
}
format!("{:x}", hasher.finalize())
}
Fields excluded from the hash (changes don't trigger rollout):
balancerdExternalCertificateSpecbalancerdReplicasbalancerdResourceRequirementsconsoleExternalCertificateSpecconsoleReplicasconsoleResourceRequirements,forcePromoteThese balancerd and console fields are excluded since they are applied on every reconciliation, without a rollout.
The forcePromote field is excluded, since it is used to promote the existing generation, and we don't want to tear that down every time it changes.
Fields included in the hash (changes trigger rollout):
environmentdImageRefenvironmentdExtraArgsenvironmentdExtraEnvenvironmentdConnectionRoleArnenvironmentdResourceRequirementsenvironmentdScratchVolumeStorageRequirementserviceAccountNameserviceAccountAnnotationsserviceAccountLabelspodAnnotationspodLabelsforceRolloutrolloutStrategybackendSecretNameauthenticatorKindenableRbacenvironmentIdsystemParameterConfigmapNameinternalCertificateSpecmaterialize.cloud/force-rollout annotationAll other spec fields, plus our force-rollout annotation.
Some of these are applied without requiring a rollout, but may require a rollout for some of their effects.
For example, serviceAccountAnnotations may be used to configure the AWS IAM role ARN, but it is unclear if that gets applied to existing pods.
A new HTTPS webhook server handles CRD version conversion:
Endpoint: POST /convert
Supported conversions:
*The API server seemed to want this, I don't know why. We can't reconcile these, so going back never makes sense.
Key conversion logic:
forcePromote: Uuid becomes forcePromote: Option<String> (nil UUID becomes None)requestRollout is removed.lastCompletedRolloutRequest and resourcesHash are removed.conditions are kept as-is.lastCompletedRolloutRequest and spec.requestRollout match:
lastCompletedRolloutHash and requestedRolloutHash should both be set to the calculated hash (after conversion). This should avoid triggering a rollout during the migration.lastCompletedRolloutHash should be set to None and requestedRolloutHash should be set to the calculated hash (after conversion). In this case, we likely have an in-progress rollout, which we will destroy and replace.We need to include the lastCompletedRolloutHash from v1alpha2 in v1alpha1 as well. This is required for round tripping from v1alpha2 -> v1alpha1 -> v1alpha2,
which may happen if a user applies a v1alpha1 change over a v1alpha2 object.
In the case there is an existing lastCompletedRolloutHash, it should be kept as-is through the round trip. As we never reconcile with v1alpha1, it should only change at v1alpha2, so this should be safe.
No attempt is made to support v1alpha1 beyond giving a valid v1alpha1 structure and supporting round tripping to v1alpha2. Fields that do not exist in v1alpha2 may have their nil value.
In these examples, we assume that orchestratord's attempt to update the stored version succeeds and that reconciliation is triggered after this update. This is only to simplify this document, and is not necessary for correctness. If orchestratord's attempt to update the stored version fails, or the reconciliation is triggered first, the conversion webhook is simply called at that time and we will reconcile the same v1alpha2 object.
status.lastCompletedRolloutRequest and spec.requestRollout matching.status.lastCompletedRolloutHash and status.requestedRolloutHash set to the same calculated hash after conversion.replace to store the resource as v1alpha2.At this point, the stored version is v1alpha2, and no rollout is triggered.
spec.environmentd_image_ref). It may or may not include spec.requestRollout, that doesn't matter.status.lastCompletedRolloutHash and status.requestedRolloutHash still match each other, but do not match the calculated hash.status.requestedRolloutHash and triggering a rollout since it is different.If the user had instead applied a v1alpha2 resource instead, no conversion would be needed and orchestratord would reconcile it directly.
status.lastCompletedRolloutRequest and spec.requestRollout not matching. It is not in "promoting" status.status.lastCompletedRolloutHash set to None and status.requestedRolloutHash set to the calculated hash after conversion.replace to store the resource as v1alpha2.status.lastCompletedRolloutRequest and spec.requestRollout not matching. It is in "promoting" status.status.lastCompletedRolloutHash set to None and status.requestedRolloutHash set to the calculated hash after conversion.replace to store the resource as v1alpha2.status.requestedRolloutHash set in the initial conversion)Conversion webhooks only support HTTPS, so we need a certificate for orchestratord. This may also be useful in the future, for admission webhooks for example.
We plan to support two options:
Issuer using cert-managerCertificate with the service DNS name, using Ed25519 algorithm, with rotation enabled.Which of these is used is determined by a new helm value operator.certificate.source, which can be either "cert-manager" (the default) or "secret".
If operator.certificate.source is set to "secret", the user must also set operator.certificate.secretName to the name of a secret in the operator's namespace.
Regardless of which they choose, the resulting secret will be mounted into the orchestratord pods.
Orchestratord will also get readiness probes so nothing tries to call this webhook before it is up.
The CRD is registered with:
mz_crd.spec.conversion = Some(CustomResourceConversion {
strategy: "Webhook".to_owned(),
webhook: Some(WebhookConversion {
client_config: Some(WebhookClientConfig {
ca_bundle: Some(ByteString(ca_bytes)),
service: Some(ServiceReference {
name: webhook_service_name,
namespace: webhook_service_namespace,
path: Some("/convert".to_owned()),
port: Some(webhook_service_port.into()),
}),
url: None,
}),
conversion_review_versions: vec!["v1".to_owned()],
}),
});
We have set v1alpha2 as the stored version, but that doesn't update existing resources. Those are only updated when they are reapplied.
During orchestratord startup, after waiting for the CRD to be established, we need to loop through all Materialize resources and replace them.
If it is possible to determine the stored version of these resources, we should only replace the ones at the older version.
I think it is OK for this to be best-effort, and only warn in case of failure. For backward compatibility reasons, we're going to have to support the old version for some time. Orchestratord is likely to get restarted/upgraded multiple times in that period, so it can try again. If the user ever writes an updated CR, it will also be stored in v1alpha2, so it isn't critical that this work immediately.
Our existing nightly orchestratord tests cover a lot, but we'll need to extend them to work with multiple CRD versions.
Still a work in progress: https://github.com/MaterializeInc/materialize/pull/34904
Continue using the hash of generated Kubernetes resources rather than the Materialize spec. Maybe changing this will be more problematic than our current issues with it.
Reasons not chosen: The resource hash includes implementation details that may change without user intent (e.g., pod labels, spec fields, additional services, etc..), causing unexpected rollouts. Hashing the spec directly is more predictable. We still have all the problems we have now.
Modify v1alpha1 directly rather than creating a new version.
Reasons not chosen: This would likely break existing deployments. Kubernetes CRD versioning with conversion webhooks is the standard approach for API evolution, and we need to be able to evolve our CRDs in backwards incompatible ways. I don't know of any backwards compatible schema that would solve our problems here. I'm open to suggestions if you know a way. Making many fields optional and deprecated gets confusing very quickly.
Mostly the same as this proposal, but changing the types of forcePromote and forceRollout to be either an i64 of seconds since the unix epoch, or a UTC ISO datetime string. This would allow us to set both of these to the same value and trigger a rollout and immediately force promotion. We'd also need to store the initial timestamp of the current rollout (currentRolloutTimestamp) and the last completed rollout (lastCompletedRolloutTimestamp) in the status (if they exist), for comparison.
When initially updating the status when starting a rollout, orchestratord would have logic like the following for determining which timestamp to use:
# treat all non-existant values as older
manuallyTriggeredTimestamp = max(spec.forceRollout, forceRolloutAnnotation)
if manuallyTriggeredTimestamp > lastCompletedRolloutTimestamp
if manuallyTriggeredTimestamp > currentRolloutTimestamp:
return manuallyTriggeredTimestamp
return now()
Reasons not chosen: This timestamp logic is complex and likely to introduce errors. The gains of being able to predict the forcePromote value seem minimal.
Backwards compatibility period: How long should we support v1alpha1 before removing it? The conversion webhook enables indefinite support, but maintaining both versions has a cost.
Certificate rotation: The current implementation uses cert-manager's automatic rotation, but we haven't tested long-running deployments through certificate renewal cycles, or implemented reloading in orchestratord.
Is it OK to require setting forcePromote in a separate modification?
We can't easily know the rollout hash in advance, so users can't set forcePromote in the same apply as other changes.
I think this is probably fine, since forcePromote only really makes sense if the rollout is stuck, which won't be the case when initially updating the spec.
Do we need to optionally disable triggering updates on spec changes in SaaS? We may also need to add a new field to the Materialize CRD spec for whether to trigger updates immediately on spec changes, or if the annotation must trigger them. This lets us not trigger updates immediately in SaaS when we update the region-controller to change the Materialize CR. We don't change the region-controller often, and we can always update it to not apply changes to the Materialize resource until we trigger that, so this probably isn't necessary.
Interface changes for helm values What I've got in this proposal probably works, but I'm not sure if it's exactly what we want to lock ourselves to.
Should we introduce an intermediate generation object? A generation object (placeholder name) similar to ReplicaSets for Deployments could simplify some of our reconciliation logic. It seemed like a lot to bite off at the same time as these other changes, though. I'm not sure if it will be backwards compatible to add it later. I think it will, but am not certain. If we want generation objects, should we consider doing that first? This is probably fine either way. We can always do another CRD migration if it turns out to not be compatible with this one.