docs/adrs/ca-cert-rotation.md
Date: 2022-12-19
Accepted
On the first startup of a new cluster, K3s currently autogenerates a number of self-signed cluster CAs and keys:
These CAs are all self-signed, without any cross-signing or common root or intermediates, and are valid for 10 years. When any of these certs expire, any certificates issued will be invalid, causing a significant outage to the cluster.
The Cluster Server CA is used in node bootstrapping. The full K10 format token includes a SHA265 sum of the
Cluster Server CA file's on-disk PEM representation. Nodes that join the cluster using a full token perform a
set of checks when starting up:
/v1-k3s/cacert on the server they are joining.K10 prefix in the
token.Realistically, this hash should have instead been derived from the DER encoding of the root certificate in that bundle, as PEM format allows for variable padding, line lengths, and so on. Only DER format is guaranteed to be stable, and hashing only the root of the chain would have allowed for rotating or renewing intermediate CAs without breaking trust between cluster nodes.
There is not currently any way to write new certificates to the datastore. The certificates and keys are written to disk once on initial startup, and from there written to the cluster datastore. From that point on, the files in the datastore are considered authoritative; replacing the files on disk will result in either replacement, or error, depending on whether or not the files on disk are newer than those in the datastore.
The secrets-encrypt subcommand does currently mutate the bootstrap data, but it only touches the secrets
encryption configuration, not the CA certs or keys.
For both of the above reasons (hash pinning, and lack of rewriteability) it is not currently possible to renew or replace the cluster CA certs or keys.
Some users (particularly government or financial customers) attempt to implement the guidance from NIST SP 800-57 Part 1 Rev. 5. This document would see users signing cluster CAs with a set of organizational root and intermediate certificates, and rotating both the intermediate and cluster CA certificates and keys on at least a yearly basis.
While the ServiceAccount signing key is not signed by any CA, rotation of the key must be done carefully so as to avoid causing an outage. The apiserver and controller-manager must be updated to use a new key, while still trusting the old key for a period of time. The old key can then be removed at a later date, once all clients using tokens signed by the old key have received new tokens.
This will require additional documentation, CLI subcommands, and QA work to validate the process steps.
Non-disruptive renewal requires no change to node configuration. The service only needs to be restarted. ↩
Disruptive renewal requires changes to the K3s CLI flags, configuration file, or environment variables prior to restarting the service. Additionally, the cluster may experience a temporary outage while the configuration change has been affected to all nodes, due to cluster nodes temporary not sharing a common root of trust. ↩