Back to Cert Manager

Improved and extensible Certificate controller

design/20200326.extensible-certificate-controller.md

1.20.217.0 KB
Original Source

Improved and extensible Certificate controller

<!-- toc --> <!-- /toc -->

Summary

The Certificate controller is one of the most commonly used controllers in the project. It represents the 'full lifecycle' of an x509 private key and certificate, including private key management and renewal.

Internally, the controller is implemented in a fairly straightforward way. We have a single controller which is responsible for:

  • ensuring the stored private key matches the 'requirements' on the Certificate resource
  • ensuring the stored certificate matches the 'requirements' on the Certificate resource
  • handling renewals when a certificate is nearing expiry (as per spec.renewBefore)
  • managing the lifecycle of a CertificateRequest resource in order to 'issue' a certificate
  • exposing Prometheus metrics

The above list is non-exhaustive.

This document proposes an alternate way of structuring this controller, to improve reliability, testability and extensibility.

Motivation

As the project is maturing, more requirements around this controller are starting to become apparent.

We have outstanding feature requests that are currently difficult to implement with the existing design:

  • Allow private key rotation when renewing certificates #2402
  • Allowing alternative Secret output formats (e.g., single .pem file priv/cert output) #843
  • Add support for JKS, PKCS12 and PEM files #586
  • Make certificate renewal easier to test #2578

This proposal aims to facilitate the above features, as well as make it easier to develop individual areas of the controller over time and continue to make improvements.

Goals

  • Make it easier to maintain the Certificates controller
  • Make it easier to extend the Certificates controller
  • Make it possible to 'hook in' to the state of the controller (e.g., manually triggering renewal)

Non-goals

  • We must not make backwards incompatible changes to the CRD schema

Proposal

As noted above, the existing logic for the Certificates controller is a single loop which is responsible for reconciling all of the aforementioned areas of the Certificate resource.

Instead, the Certificates controller will split into a number of distinct controllers, each with their own well-defined responsibilities, that communicate via the Certificate resource's status field.

  • keymanager - generates and stores private keys when an issuance is required. Manages the status.nextPrivateKeySecretName field.
  • requestmanager - creates and manages CertificateRequest resources for Certificates when an issuance is required.
  • issuing - issues the signed x509 certificate and 'next private key' into the spec.secretName when the CertificateRequest is valid. Manages the status.revision field. Responsible for removing the Issuing condition.
  • trigger - monitors the Secret resource and certificate.spec and adds the Issuing condition when issuance is required.

API changes

In order to facilitate communication and cooperation between these controllers, some API changes are required to contain computed state to be consumed by other controllers. These additional fields will be encompassed in the certificate.status stanza.

go
package v1alpha3

type CertificateStatus struct {
    // EXISTING FIELDS HERE

    // ADDITIONAL FIELDS

    // The current 'revision' of the certificate as issued.
    //
    // When a CertificateRequest resource is created, it will have the
    // `cert-manager.io/certificate-revision` set to one greater than the
    // current value of this field.
    //
    // Upon issuance, this field will be set to the value of the annotation
    // on the CertificateRequest resource used to issue the certificate.
    //
    // Persisting the value on the CertificateRequest resource allows the
    // certificates controller to know whether a request is part of an old
    // issuance or if it is part of the ongoing revision's issuance by
    // checking if the revision value in the annotation is greater than this
    // field.
    //
    // A CertificateRequest with no annotation that is owned by a Certificate
    // resource will be automatically deleted in order to not complicate the
    // 'renewal required' issuance logic. This means that any requests in user
    // clusters will be deleted upon upgrade, and in cases where a re-issuance
    // is required, another will be created with the appropriate `revision`.
    //
    // +optional
    Revision *int `json:"revision,omitempty"`

    // The name of the Secret resource containing the private key to be used
    // for the next certificate iteration.
    // The keymanager controller will automatically set this field if the
    // `Issuing` condition is set to `True`.
    // It will automatically unset this field when Issuing is not set or False.
    // +optional
    NextPrivateKeySecretName *string `json:"nextPrivateKeySecretName,omitempty"`
}

type CertificateCondition string

var (
    // A condition added to Certificate resources when an issuance is required.
    // This condition will be automatically added and set to true if:
    //   * No keypair data exists in the target Secret
    //   * The data stored in the Secret cannot be decoded
    //   * The private key and certificate do not have matching public keys
    //   * If a CertificateRequest for the current revision exists and the
    //     certificate data stored in the Secret does not match the
    //    `status.certificate` on the CertificateRequest.
    //   * If no CertificateRequest resource exists for the current revision,
    //     the options on the Certificate resource are compared against the
    //     x509 data in the Secret, similar to what's done in earlier versions.
    //     If there is a mismatch, an issuance is triggered.
    //
    // The final case above where no CertificateRequest resource exists is
    // essential for backwards compatibility, as older CertificateRequest
    // resources will not have a revision assigned so we cannot compare against
    // them. In these cases, we fall back to the behaviour we use today of
    // comparing the Certificate spec to the issued x509 certificate in the
    // Secret.
    //
    // This condition may also be added by external API consumers to trigger
    // a re-issuance manually for any other reason.
    //
    // It will be removed by the 'issuing' controller upon complete issuance.
    CertificateConditionIssuing CertificateCondition = "Issuing"
)

At the core of this proposal is the addition of the status.iteration field (an integer). This field indicates the current 'version' of the certificate. When a certificate is first created, the iteration is set to 1.

Keymanager API fields

We will add a new field to the CertificateStatus structure:

go
package v1alpha3

type CertificateStatus struct {
    ...

    // The name of the Secret resource 
    // +optional
    NextPrivateKeySecretName string `json:"nextPrivateKeySecretName,omitempty"`
}

Controller behaviour

Implementing this proposal will require a complete replacement of the current certificates controller. Almost all areas of code will be replaced.

The 'compare a certificate.spec with an x509 certificate' logic can and should be preserved to help maintain some semblance of backward compatibility for users upgrading from previous releases.

keymanager

The keymanager controller will be responsible for maintaining the status.nextPrivateKeySecretName field and any 'next private key' Secret resources that are owned by Certificates.

  • If the Issuing condition is True:

    • If the status.nextPrivateKeySecretName field is not set:
      • Check for existing 'next private key' Secret resources:
        • If one matches, set the status.nextPrivateKeySecretName field. This handles cache inconsistencies when we observe Secret creation before updating status.nextPrivateKeySecretName.
        • Otherwise, generate a new private key according to spec.privateKey and store it in a new Secret resource. Persist the name of the Secret as status.nextPrivateKeySecretName.
    • If the status.nextPrivateKeySecretName field is set:
      • If the Secret contains key data that matches spec.privateKey:
        • do nothing
      • If the Secret is not 'owned' by the Certificate, do nothing and log an Event to inform the user.
      • If the Secret is not labelled as a 'next private key', do nothing and log an Event to inform the user.
      • If the Secret does not exist:
        • Generate a new private key according to spec.privateKey and create a Secret resource with the given name.
      • If the secret contains key data that does not match spec.privateKey:
        • Generate a new private key according to spec.privateKey and store it in the Secret resource.
  • If the Issuing condition is False or not set:

    • Delete all owned Secret resources with the cert-manager.io/next-private-key: "true"
    • Ensure status.nextPrivateKeySecretName is unset - we may want to consider not doing this in case a user has manually specified this field and pointed it at an 'un-owned' Secret. This depends on whether we want to support this as a mode of operation.

When creating a 'next private key' Secret resource, the cert-manager.io/next-private-key: "true" annotation is added as well as an OwnerReference to the Certificate resource.

Private keys generated by the key manager will be encoded in PKCS#8 format for ease of interoperability. The issuing controller will encode the resulting key-pair into the format requested by the user once the request has been completed.

The introduction of this dedicated controller also means we can implement private key rotation when a certificate is re-issued/renewed. This is a welcome new feature, but in some cases a user may want to pin the private key used for a key-pair (as is the default and only supported behaviour prior to implementing this design). To continue to enable this, the spec.privateKey.rotationPolicy field controls how the next private key should be sourced. It supports two values:

  • Always: a new private key will be generated on every re-issuance. This includes renewals as well as changes to the spec.privateKey and other spec fields.
  • Never: private keys will never been regenerated and must be provided by the user.

TODO: We may be better to have spec.privateKey.pinnedSecretName instead, to name a Secret that contains the private key to use. We could then require this key to be provided in a specific format. With the rotationPolicy policy design, a user must pre-create the Secret resource containing their private key ahead of time, which creates a conflict of ownership and confusion.

If the spec.privateKey options change during an issuance, the key will be regenerated and the same Secret resource will be reused to store the updated private key. Consumers reading this Secret (i.e. the requestmanager) must compare the public key of the named 'next private key' and if the public key does not match with the CertificateRequest being managed, the CertificateRequest should be recreated.

requestmanager

The requestmanager is responsible for managing a CertificateRequest resource for a Certificate. If a Certificate has the Issuing condition, it will ensure a CertificateRequest signed by the status.nextPrivateKeySecretName exists and matches the specification for the Certificate in certificate.spec.

This is the controller that most closely resembles the bulk of the logic in the existing certificates controller.

It will behave as follows:

  • If the Issuing condition is True:
    • If the status.nextPrivateKeySecretName field is not set:
      • Do nothing
    • List all owned CertificateRequest resources, delete all that have a 'revision' not equal to status.revision + 1 (or 1 if status.revision is not set).
      • If multiple have the same current revision, delete any that do not 'match' as per the two verification steps below. If multiple still exist, log an event and delete all but one.
    • If no CertificateRequest for the current revision exists: Create a CertificateRequest based on the certificate.spec.
    • Verify the public key of the certificate request matches the public key of the stored status.nextPrivateKeySecretName
      • If not, delete the CertificateRequest & log an Event
    • Verify the CSR options match what is requested in certificate.spec
      • If not, delete the CertificateRequest & log an Event
  • If the Issuing condition is False or not set:
    • Delete any CertificateRequest resources that do not have a 'revision' equal to status.revision.

issuing

This controller will copy the status.certificate and status.ca fields from valid CertificateRequest resources and the tls.key from the status.nextPrivateKeySecretName Secret resource into the spec.secretName.

It is responsible for encoding the private key and certificate data into the appropriate format.

Once the key-pair has been written to spec.secretName, it will set the status.revision field to that of the CertificateRequest and remove the Issuing status condition (in the same Update call).

It will behave as follows:

  • If the 'Issuing' condition is True:
    • If the status.nextPrivateKeySecretName field is not set:
      • Do nothing
    • Find the CertificateRequest for status.revision + 1
      • If none exist, do nothing.
      • If multiple exist, do nothing (requestmanager will handle this)
    • If the CertificateRequest is not in a 'final' state, do nothing.
    • If it is 'valid':
      • Verify the public key of the certificate request matches the public key of the stored status.nextPrivateKeySecretName
        • If not, do nothing (requestmanager will handle this)
      • Verify the CSR options match what is requested in certificate.spec
        • If not, do nothing (requestmanager will handle this)
      • Encode and issue the key-pair (store it in the Secret)
      • Set status.revision to revision of the CertificateRequest
      • Remove Issuing status condition
      • Clear status.lastFailureTime (if set)
      • Log an Event
    • If not 'valid':
      • Set status.lastFailureTime (if not equal to CertificateRequest)
      • Set Issuing status condition to False with reason explaining when the request will be retried
  • If the 'Issuing' condition is False or not set:
    • Do nothing

trigger

The trigger plugin is responsible for observing the state of the currently issued spec.secretName and the rest of the certificate.spec fields to determine whether a re-issuance is required. It triggers re-issuance by adding the Issuing status condition when a new certificate is required.

These conditions will cause a request to be triggered:

  • spec.secretName does not exist, or contains data that cannot be decoded.
  • Issuer name/kind annotations on Secret resource do not match current spec.issuerRef.
  • The public key of the tls.key and tls.crt do not match.
  • A CertificateRequest for the current status.revision exists and the current requested certificate.spec does not match the options on the CSR.
  • If the stored x509 certificate's NotAfter is within spec.renewBefore of the current time.

Additionally, the controller is responsible for implement a 'back-off' if CertificateRequest resources persistently fail to complete. For now, this will be a continuation of the current behaviour of backing off by 1 hour after a request fails. Even if any of the above conditions are true, a request will not be triggered unless the current time is at least 1 hour after the status.lastFailureTime, if set. In future we can extend this logic to back-off exponentially by storing a longer history of CertificateRequest resource's failure times, but this is out of scope of this proposal.

Another actor may choose to manually trigger an issuance by setting the Issuing condition themselves (i.e. with a cert-manager CLI tool, or their own controller). The trigger controller will not interfere in this case and will take no extra action.