docs/adrs/etcd-snapshot-cr.md
Date: 2023-07-27
Accepted
K3s currently stores a list of etcd snapshots and associated metadata in a ConfigMap. Other downstream projects and controllers consume the content of this ConfigMap in order to present cluster administrators with a list of snapshots that can be restored.
On clusters with more than a handful of nodes, and reasonable snapshot intervals and retention periods, the snapshot list ConfigMap frequently reaches the maximum size allowed by Kubernetes, and fails to store any additional information. The snapshots are still created, but they cannot be discovered by users or accessed by tools that consume information from the ConfigMap.
When this occurs, the K3s service log shows errors such as:
level=error msg="failed to save local snapshot data to configmap: ConfigMap \"k3s-etcd-snapshots\" is invalid: []: Too long: must have at most 1048576 bytes"
A side-effect of this is that snapshot metadata is lost if the ConfigMap cannot be updated, as the list is the only place that it is stored.
Reference:
Rancher already has a rke.cattle.io/v1 ETCDSnapshot Custom Resource that contains the same information after it's been
imported by the management cluster:
It is unlikely that we would want to use this custom resource in its current package; we may be able to negotiate moving it into a neutral project for use by both projects.
snapshotFile type, K3s
will manage creation of an new Custom Resource Definition with similar fields.