Back to Cert Manager

Proposal: Certificate Renewal Control (windows + disable)

design/20250920.certificate-renewal-control.md

1.20.213.8 KB
Original Source

Proposal: Certificate Renewal Control (windows + disable)

Author(s):

  • Erik (draft)
  • Hemant Joshi

Status: Draft Date: 2025-09-20

Summary

Add a small, backward-compatible extension to the Certificate API that allows users to:

  1. Define renewal windows --- time ranges when cert-manager is allowed to attempt automatic renewals.
  2. Disable automatic renewal entirely for a given Certificate resource.

The goal is to give cluster operators and application owners better operational control over when certificate renewals happen (to avoid renewal during business hours, maintenance windows, or restricted network availability), while making the behavior explicit and discoverable in status and safe-by-default.


Motivation

Current cert-manager behavior: certificates are renewed automatically based on duration and renewBefore. There are valid real-world situations where users want to control when renewal attempts are performed:

  • Renewals that contact external ACME endpoints should be scheduled during off-peak windows to limit impact to rate limits or network egress costs.
  • Stateful applications may want to coordinate rolling restarts with certificate replacement; ops teams may only want renewals during maintenance windows.
  • For test environments the user might want to disable renewal entirely to test expiry behavior.

Providing a first-class API for these requirements improves transparency and reduces reliance on out-of-band tooling (cronjobs, custom controllers) to gate renewals.


Goals

  • Minimal, intuitive API extension to Certificate that is easy to reason about.
  • Backwards compatible: absence of fields implies existing behavior.
  • Clear observability: status.conditions show why a renewal is deferred or disabled.
  • Safe defaults: do not cause unexpected certificate expiries silently.
  • Update documentation for the Certificate CRD renewalPolicy field with examples of windows and recommended guidelines on how to configure windows.

Non-Goals

  • Replace complex external scheduling systems.
  • Implement full calendar/scheduling language.
  • Once the extension is stable, this could be migrated to a RenewalPolicy CRD so that the policies can be shared across certificates.

Proposed API

Add a new renewal block to CertificateSpec with two child fields: policy and windows.

CRD snippet

yaml
spec:
  renewal:
    # Type of policy to use for renewal. 
    # Default: RenewBefore 
    policy: RenewBefore # RenewBefore | EarliestWindow | Disabled

    # Optional. If provided, renewal may only happen during one of the listed windows.
    # If empty or omitted, renewals may occur at any time.
    windows:
      - cron: ["0 23 * * 1-5"] # Window is 11 pm - 5 am from Monday - Friday
        duration: "6h"
        timeZone: "America/Denver"
      - cron: ["0 10 * * 6,0"] # Window is 10 am - 6 pm on Sat and Sunday
        duration: "8h"
        timeZone: "America/Denver"

Field definitions

  • renewal.policy (string, optional): when RenewBefore or EarliestWindow without windows, cert-manager follows the existing behavior of using renewBefore to renew the certificates.

    • If windows are mentioned along with RenewBefore then cert-manager will try to find the latest (forward) time which matches renewBefore. If there is no renewalTime in the window, then the next renewalTime in the window would be returned. Add status on the certificates when the renewalTime falls out of the window. Also, add status on the cert if the renewalTime is after expiration date i.e. after notAfter.
    • If windows are mentioned along with EarliestWindow, then cert-manager will try to find a renewalTime that is earliest (before) time which matches the window. This means, that if a renewBefore is ignored and cert-manager will try to find a renewalTime earliest within a window. If, a renewalTime is outside of the compliant window add it as a status on the cert object.
    • If set to Disabled cert won't be renewed.
  • renewal.windows (array of RenewWindow, optional): defines one or more allowed renewal windows. If omitted, renewal can happen at any time (existing behavior).

    • cron: This defines a cron window which mentions the start time and the days when the renewal is allowed.
    • duration: This determines the duration of the renewal.
    • timezone: Timezone determines the timezone of the time. This must obey an IANA time zone listed in the link.

Notes: - renewal.policy=RenewBefore and renewal.policy=EarliestWindow without windows would behave the same way as today where they would try to get a renewalTime before renewBefore.

High level diagram

renewal.policyWindows defined?RenewalTime DecisionStatuses Added to Certificate
RenewBeforeNoSame as existing behavior — choose renewBefore (i.e. NotAfter - X).None (normal behavior).
RenewBeforeYes1. Try to find the latest time within the allowed windows that is ≤ renewBefore.
  1. If no such time exists in any prior window, pick the next allowed window (the next forward slot). | Add appropriate conditions to the cert status. | | EarliestWindow | No | Same as existing behavior — behaves like RenewBefore (i.e., respect renewBefore as today). | None (normal behavior). | | EarliestWindow | Yes | Ignore renewBefore for window selection. Find the earliest allowed time within the configured windows (the earliest window slot). | Add appropriate conditions to the cert status. | | Disabled | — | Certificate renewal is disabled (no renewal scheduling). | None — renewal not performed. |

Tip: When renewal.windows are omitted, both RenewBefore and EarliestWindow fall back to the existing renew-before behavior, so the table's "No windows" rows reflect current behavior.

Timezones

For the sake of uniformity, all windows defintions are going to be treated as UTC by the controller. renewal.windows definitions allow IANA timezones for better configurations.


Controller logic

Current behavior

Cert-manager typically schedules a renewal event when the certificate's NotAfter minus renewBefore is reached (or earlier, depending on internal jitter and queueing). The controller reconciles Certificates and triggers issuers.

Updated behavior

  1. On reconcile, compute desiredRenewalTime using existing logic (expiry - renewBefore).
  2. If renewal.policy == Disabled:
    • Do not schedule renewal operations.
    • Set a RenewalDisabled condition in status with a helpful message and the observedGeneration when it was last observed.
  3. Else if renewalPolicy.policy == RenewBefore:
    • Keep the existing behavior of cert-manager that is to use renewBefore. Find a renewalTime which fits in the window. If it doesn't fit the window then add a status message; also add a message if the renewalTime compliant with the window falls after expiration.
  4. Else if windows is provided and renewalPolicy.policy == EarliestWindow:
    • Try to find a renewalTime at the earliest which is compliant with the window. If it is not compliant and it is past expiration, then post a status on the cert object.

Edge cases

  • If windows are misconfigured (invalid timezone, invalid time string), set a status.renewal.window.valid=false condition and do not schedule renewals until corrected.

Status changes and Conditions

Add the following renewal field to the status and also update some existing fields accordingly:

yaml
status:
  renewalTime: "" # Update this according to the controller logic. Existing field.
  lastFailureTime: "" # Again this exists and probably doesn't need to be touched.
  renewal:
    policy: RenewBefore
    windows: # This will only be set if windows is set.
      valid: "True"
  # Every renewal check would add a condition and mark the cert as ready or not along with proper reasons
  conditions: []

When a renewal is attempted or completed, existing issuance conditions (e.g., Issuing, Ready) still apply.


API examples

1) Disable renewal completely

yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: no-renewal-cert
spec:
  secretName: no-renewal
  dnsNames: ["test.example.com"]
  renewal:
    policy: Disabled 

Interactions with other features

  • ACME rate limits: By allowing windows, users may unintentionally bunch renewal attempts into smaller time periods. Document guidance about rate limits and encourage staggered windows for many Certificates.
  • Certificate controllers / Reloader / Pod restarts: If another controller watches secret updates and triggers restarts, users should ensure windows align with maintenance windows. This feature intentionally provides that control.
  • Manual renewal: kubectl cert-manager renew (or similar manual actions) should continue to work irrespective of disabled or windows because those are meant to affect automatic renewal only---unless the user explicitly requests that manual operations be blocked (not proposed here).

Safety and UX considerations

  • Make disabled explicit and require no special RBAC.
  • For any configuration parsing error, surface the problem in a RenewalConfigInvalid status to avoid silent misbehavior.
  • Provide helpful CLI/kubectl hints in messages where appropriate (e.g. To renew manually: kubectl cert-manager renew certificate/no-renewal-cert).

Implementation plan (rough)

  1. Update API types (Go structs) and CRD YAML. Add unit tests for validation parsing (window, timezone correctness).
  2. Add status condition types and helper methods.
  3. Extend the Certificate reconciler to evaluate renewal:
    • Validate renewal early in reconcile.
    • Compute next allowed window.
    • Requeue reconcile for next allowed start when necessary.
  4. Add e2e tests that simulate time progression (using fake clocks or test helpers) to verify:
    • Renewal occurs inside windows.
    • Renewal doesn't occur when policy is Disabled.
    • Also, simulate EarliestWindow.
  5. Documentation: user guide, examples, migration notes.

Validation and Admission

  • No admission webhook required for initial iteration. CRD validation should constrain start/end format using regex and invalid values will be caught at runtime with RenewalConfigInvalid).

Metrics and Monitoring

Suggested metrics additions: - certmanager_certificate_renewal_deferred_total{reason="outside_window"} - certmanager_certificate_renewal_bypassed_total{reason="expiry_imminent"}

Document that operators should alert on many renewal_deferred events for certificates approaching expiry.


Alternatives considered

  1. Cron-like schedule field: More expressive (cron expression) but increases complexity for users and parsing.
  2. External scheduler integration: Leave renewal control to an external controller. This keeps core simpler but adds operational burden.
  3. Per-issuer scheduling: Instead of per-Certificate, allow Issuers to be configured with windows. This reduces per-certificate flexibility.

We opted for per-Certificate windows for fine-grained control and simplicity of the API.


Migration story

  • Old Certificates without renewal behave exactly as today.
  • Adding renewal is opt-in.
  • No existing certificates are changed.

Testing matrix

Standard unit and end-to-end tests will be used to verify new behaviour, as used by cert-manager currently. Current end-to-end tests for Certificate resources will also give a good signal for renewal field.

  • Unit tests:
    • Parse windows; invalid times; rollover windows.
    • renewal.policy=Disabled path.
  • Integration tests / e2e:
    • Certificates with windows succeed only within windows.
    • Certificates approaching expiry trigger fail-safe.
    • Status conditions are correctly emitted in all cases.