Back to Cert Manager

Inferring TLS Hostnames From Gateway Routes

design/20230601.gateway-route-hostnames.md

1.20.28.4 KB
Original Source

Inferring TLS Hostnames From Gateway Routes

<!-- toc --> <!-- /toc -->

Release Signoff Checklist

This checklist contains actions which must be completed before a PR implementing this design can be merged.

  • This design doc has been discussed and approved
  • Test plan has been agreed upon and the tests implemented
  • Feature gate status has been agreed upon (whether the new functionality will be placed behind a feature gate or not)
  • Graduation criteria is in place if required (if the new functionality is placed behind a feature gate, how will it graduate between stages)
  • User-facing documentation has been PR-ed against the release branch in [cert-manager/website]

Summary

For generating Gateway API certificates, use hostnames present in, e.g., GRPCRoute, HTTPRoute, and TLSRoute resources in addition to the Gateway listener hostnames. This reduces configuration duplication, and allows the cluster owner to delegate permission to site owners to add hostnames.

Motivation

Currently, the gateway-shim only looks at the hostname in Listener. This field is optional, and its purpose is to filter which hostnames routes are allowed to match. This double-configuration allows the cluster owner to set allowed hostnames in GatewaySpec, while individual site owners update their HTTPRouteSpec. In cases where this permission model is unnecessary (either because all hostnames are allowed, or because the cluster and site owners are the same team), this leads to awkward duplication. As with any configuration duplication, it is easy to miss an update in one place, causing difficult-to-find bugs, and requiring teams to maintain more internal documentation. E.g. Envoy Gateway already supports running a Gateway without hostnames in the Listener.

Another drawback inherent in using Listener.hostname is that it is a singleton. To add another hostname, the entire Listener object must be duplicated, including port, protocol and tls fields. This adds yet another source of duplication.

Goals

  • To be compliant with the intention of the Gateway API.
  • To treat resources the same way as current Gateway API implementations, e.g., Envoy Gateway.
  • To remove duplicated configuration.

Non-Goals

N/A

Proposal

yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: tls-basic
spec:
  gatewayClassName: acme-lb
  listeners:
  - name: https-1
    hostname: 1.example.com
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - name: default-cert
  - name: https-2
    hostname: 2.example.com
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - name: default-cert
---
# An HTTPRoute that uses the two hosts.

Compare this with the following HTTPRoute and Gateway:

yaml
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: example-gateway
spec:
  gatewayClassName: example-gateway-class
  listeners:
  - name: http
    protocol: HTTP
    port: 80
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: example-route
spec:
  parentRefs:
  - name: example-gateway
  hostnames:
  - "1.example.com"
  - "2.example.com"
  rules:
  - backendRefs:
    - name: example-svc
      port: 10080

Note that HTTPRouteSpec.hostnames is a list, avoiding duplication. As long as there are no hostnames in the Listener, this allows the hostnames as if they were present there. If there are hostnames in the Listener, the spec says the Listener only deals with the intersection.

Hostnames make more sense in Route resources than in Listeners, as a single route may be used for both HTTP and HTTPS.

See the Gateway API spec on GatewaySpec.listeners for more information.

User Stories

  1. Site owner creates an HTTPRoute with two new hostnames, but doesn't change the Gateway.
  2. cert-manager immediately picks them up and re-generates the certificate for the Gateway.

Risks and Mitigations

Ultimately, this is nothing new, it's just about following the Gateway API spec.

  1. The gateway-shim needs to subscribe and react to all Route resources, which could add CPU/memory/API server load.
  2. If the cluster owner and site owners are separate, requiring the cluster owner to allow specific hostnames may be beneficial.

Design Details

This is based on the proof-of-concept in tommie/cert-manager.

The easiest way to implement this is to generate synthetic listeners early in gateway-shim, and let the main controller logic stay the same. Listeners with hostnames are not affected, since the intersection of routes and listeners determines the listener's capabilities. A listener without hostname matches any hostname in attached routes, and they can simply be copied once for each route hostname. I.e. the second example under Proposal would be translated to the first.

Some glue data types are needed to support all routes that can carry hostnames. At the moment, these are:

Test Plan

Since this code deals with how cert-manager reacts to changes in CRDs, it is enough to focus on unit tests. For a given set of Gateways and Routes, it should generate a given synthetic Gateway.

Graduation Criteria

N/A

Upgrade / Downgrade Strategy

Downgrading to a cert-manager that does not support looking up hostnames in Routes may lead to unavailability.

The change is upgrade-compatible, if all Gateways already specify hostnames in Listeners. However, if a Gateway does not specify hostnames in Listeners, upgrading may cause certificates to be issued for hostnames not previously seen. In terms of security, this is not an issue, as the Routes have always existed; they simply didn't have a valid certificate.

Supported Versions

Production Readiness

How can this feature be enabled / disabled for an existing cert-manager installation?

Since this can be implemented as an input transform, that transform could be behind a feature flag. Indeed, the entire Gateway API support is already behind a feature flag: ExperimentalGatewayAPISupport

It probably does not need a specific one.

Does this feature depend on any specific services running in the cluster?

It requires Gateway API CRDs, but nothing more than is already required for gateway-shim.

Will enabling / using this feature result in new API calls (i.e. to Kubernetes apiserver or external services)?

It will subscribe to HTTPRoute, TLSRoute, and similar resources, beyond the previously subscribed Gateway resources.

Will enabling / using this feature result in increasing size or count of the existing API objects?

It will not write new objects. Enabling the feature may cause cert-manager to recognize hostnames it wasn't aware of before, and therefore issue new Certificates on upgrade.

Will enabling / using this feature result in significant increase of resource usage? (CPU, RAM...)

No. Route resources are small.

Drawbacks

None? cert-manager should follow the Gateway API spec.

Alternatives

N/A