rfd/0160-kubernetes-operator-resource-versioning.md
This RFD discusses how we can implement multiple version support for the same
resource in the Teleport Kubernetes Operator. For example, the operator currently
only supports RoleV5 while we have released RoleV6 and RoleV7 resources
with new capabilities.
Users want to manage their Teleport resources via the Teleport Kubernetes Operator. However, in its current state, the operator does not support when we introduce a newer version of a resource. This blocks users from leveraging new Teleport features such as granular Kubernetes Access Control with RoleV7.
We currently version the Teleport CRs like the Teleport resources. While this makes sense from a user point of view, this is not compatible with the way Teleport manages versions. Teleport does not support exposing resources through separate per-version APIs. The version conversion happens on the Teleport side, when establishing new defaults.
This means we end up with two sources of truth:
CheckAndSetDefaults().Both storages are treating versions differently and don't agree as to how to represent a resource in a given version.
Put the Teleport resource version in the Kubernetes Kind and treat different versions as different resources.
This approach completely avoids any conversion problem by not doing conversions. This way, we don't have to deal with how Kubernetes does API versioning and the fact it is not compatible with how Teleport manages versions.
For example, to support roles v6 and v7 we would introduce:
kind: TeleportRoleV6
apiVersion: v1
---
kind: TeleportRoleV7
apiVersion: v1
All CRs could be managed by the same controller using the unstructured client, or multiple controllers if we need it.
Migrating a role from v5 to v6 will take an extra step (disable reconciliation of v5 + remove finalizers, then create a v6 role and delete the v5?).
TeleportRole vs TeleportRoleV7 can be a bit confusing, especially when using kubectl.
We can do a breaking change to edit the short names and make the CLI
experience more consistent if needed. For example:
# get roles v5
kubectl get teleportrolev5
# get roles v6
kubectl get teleportrolev6
# get roles v7
kubectl get teleportrolev7
# get roles v5, we could remove it but it would break
kubectl get teleportrole
Users can create multiple resources with the same name udner different
versions (e.g., two CRs TeleportRole and TeleportRoleV7 with the same
name.) This would cause a non-deterministic behaviour. We can mitigate this
risk by labeling the resource with the CR kind/version. This risk already
exists if you run two operators against the same cluster.
This design adds little changes to the current Teleport Kubernetes Operator security model. The only risks are:
kubectl get all versions of a resource to
get a full view.This design is future-proof as it will accommodate any new Teleport resource or version.
When writing this RFD, two alternatives were considered:
The Kubernetes-friendly approach would be to make the operator aware of how the resource is stored in Kubernetes, and do the conversions for every Kubernetes CR API call via webhooks.
Handling resource conversion at the operator level requires the operator to validate the resource, set its defaults, and convert between versions. This causes several problems:
CheckAndSetDefaults and to handle conversion. This makes the operator a
client behaving differently, and blurs the responsibility between Teleport and
its clients.tctl will result in
different behaviors.Spec/Status separation in RFD 0153, but the existing resources are
flawed.This approach can also cause additional friction:
CheckAndSetDefaults() from the client and consolidating
defaults injection and resource conversion server-side. This approach is not compatible
with the conversion hooks pattern as we'd
need to run CheckAndSetDefaults in the operator and duplicate the logic.We can break the relation between the CRD version and the Teleport resource, and
specify the version in the CR spec. This means users would use a single version
of resources.teleport.dev for all their resources.
Before, an admin would create a RoleV5 by creating a TeleportRole through via the API
resources.teleport.dev/v5 and a UserV2 through the api resources.teleport.dev/v2.
apiVersion: resources.teleport.dev/v5
kind: TeleportRole
metadata:
name: myrole
spec:
allow:
rules:
- resources: ['user', 'role']
verbs: ['list','create','read','update','delete']
---
apiVersion: resources.teleport.dev/v2
kind: TeleportUser
metadata:
name: myuser
spec:
roles: ['myrole']
With this approach, both TeleportRole and TeleportUser resources would be created
through the resources.teleport.dev/vX API. The Teleport resource version would
be specified in a separate field: teleportResourceVersion.
apiVersion: resources.teleport.dev/vX
teleportResourceVersion: "v5"
kind: TeleportRole
metadata:
name: myrole
spec:
allow:
rules:
- resources: ['user', 'role']
verbs: ['list','create','read','update','delete']
---
apiVersion: resources.teleport.dev/vX
teleportResourceVersion: "v2"
kind: TeleportUser
metadata:
name: myuser
spec:
roles: ['myrole']
The version vX in resources.teleport.dev/vX needs to be higher than the highest
current version (TeleportRole is served under v6). We can set it to the current
Teleport version (v15 or v16 depending on the timing).
This approach has the following limitations:
vX can be confusing. See the API evolution section.This is a variant of
the "putting version in the CR" approach, but
instead of using resources.teleport.dev/vX with vX being the Teleport
version when this was implemented, we introduce a new v1 API.
For example: operator.teleport.dev/v1.
This is cleaner and semantically easier to understand than using vX. However, this does
not provide easy upgrade paths from existing resources under resources.teleport.dev to
the new API.
One workaround would be to write lightweight controllers reconciling resources
resources.teleport.dev with operator.teleport.dev but this would add complexity
that might not be compensated by the benefits which are mostly cosmetic.
The test plan is the following:
TeleportRoleV6 and TeleportRoleV7 once this is implemented