rfd/0112-securing-docker-images.md
This RFD discusses structures and processes to increase the security of the OCI images we use to deliver Teleport to our customers.
One of our shipping artifacts is are a collection of OCI images. As the provider of those OCI images we are (at least partially) responsible for everything in it, not just Teleport.
We should not ship vulnerabilities to our clients, even if those vulnerabilities are not directly in our software.
For the sake of this RFD, I will define delivering a secure image as
Reliably producing an OCI image that a priori is unlikely to contain vulnerabilities in and of itself, in that it:
- has the smallest footprint reasonably possible,
- contains software of reasonably known provenance,
- has no fixable, high-risk warnings or vulnerabilities flagged when run through a reputable scanner at the time of creation,
- flags vulnerabilities even after creation, either in Teleport itself or a dependency, and
- is updated to resolve any discovered vulnerabilities in a timely fashion.
Where
reasonably known provenance means that the software comes from either ourselves or a known, reputable source.
a timely fashion for updates is as per the Teleport Vulnerability Management Policy:
| Severity | Resolution time |
|---|---|
| Critical | 7 days |
| High | 30 days |
| Moderate | 90 days |
| Low | ~180 days |
This RFD will not discuss building images for and/or running Teleport on platforms other than Linux.
This RFD describes a 2-pronged approach for meeting the above goals:
Distroless images contain only an application and the minimal set of dependencies for it. Google offers several base images that contain minimal Linux distribution that we can use as a starting point.
Switching to distroless images drastically reduces the number of software components we ship as part of a Teleport distribution. This both reduces the size of the potential attack surface, and reduces the potential for high-noise reports from automated scanners.
The Google-supplied distroless base images also provide a mechanism for verifying the
provenance of a given image
using cosign and a public key. Stronger, SLSA-2 level guarantees can be verified with additional
tooling.
NOTE: We are already using Distroless images to distribute some Teleport plugins. This would extend their use to Teleport proper.
Using an ongoing automated scanner means that we do not just check for vulnerabilities at image creation time, instead we proactively & continually scan for any vulnerabilities that may be discovered until the image is either replaced with a newer version of the image, or the support lifetime for the version of Teleport on that image expires (i.e. falls out of our 3-version support window)
NOTE: This section assumes an understanding of RFD 73 "Public Image Registry".
What is the minimal set of requirements to run Teleport on Linux in a container?
Most Teleport dependencies are statically compiled into the teleport binary,
giving us a smaller set of runtime dependencies than you might imagine:
GLIBC >= 2.17, libgcc, etc)dumb-init is required for correct signal and child processes handling
inside a container.libpam (and its transitive dependencies) for PAM supportRequirement (1) (i.e. Teleport itself) is provided by our CI process.
Requirements (2) and (5) are satisfied automatically by using the google-
provided base image gcr.io/distroless/cc-debian11,
which is configured for "mostly statically compiled" languages that require libc.
Requirements (3) and (4) can be sourced either from the upstream Debian repository, or
downloaded directly from their project's source repository. Sourcing dumb-init,
libpam and so on from the Ubuntu or Debian package repositories implies some minimal
curation and provenance checking by the debian packaging tools, so we will prefer that to
sourcing them elsewhere.
The distroless base image will be pulled and verified prior to constructing the
Teleport image, using the cosign tool as described here.
Verifying the image signature will allow us to specify a floating tag for the base image (and thus automatically include the latest version of every package in the base image, with any security fixes, etc. included) while still validating the provenance of the base image itself.
NOTE: This approach sacrifices repeatability for convenience. That is, by always grabbing the latest revision of the image we are at the mercy of the the
distrolessteam regarding changes to our base layer.Why choose this over a stable repeatable build? Because the
distrolessbuild system automatically follows updates to the underlying debian packages and automatically rebuilds the base image every time a PR is merged on a package in Debian. Following the floating tag means we automatically get upstream security updates.
It is technically possible for the image to be poisoned post-validation (e.g.
in a shared build environment, where a malicious peer could re-tag a malicious
image as the base). We can avoid this poisoning scenario because cosign returns
the hash of an image after verifying its signature. If we only refer to the
base image by the returned hash once it has been verified, any tag poisoning
attack will not be effective.
The image will be built from a multi-stage docker file, using build stages to download and unpack the required debian packages and copy them into place on the distroless image.
An example Dockerfile, assuming the Teleport Debian package is supplied by the CI system, might look like something like:
FROM debian:11 as dumb-init
RUN apt update && apt-get download dumb-init && dpkg-deb -R dumb-init*.deb /opt/dumb-init
FROM debian:11 as teleport
COPY teleport*.deb
RUN dpkg-deb -R teleport*.deb /opt/teleport
# NOTE: the CC image supplies libc, libgcc and a few basic runtime libraries for us
FROM gcr.io/distroless/cc-debian11
COPY --from=dumb-init /opt/dumb-init/bin/dumb-init /bin
COPY --from=teleport /opt/teleport/bin/* /bin
ENTRYPOINT ["/bin/dumb-init", "teleport", "start", "-c", "/etc/teleport/teleport.yaml"]
NOTE: This unpack-and-copy installation method is only appropriate for packages with no complex installation requirements, like post-install hooks.
Also note that for the sake of clarity I'm only including one dependency package. In the real distribution there would be multiple packages required.
As part of researching this RFD, I examined a couple of alternative ways to construct the Teleport image.
bazel, distroless and rules_docker:
Given that the underlying distroless images are built using bazel, it should be
possible to construct a custom image for Teleport in the same way. After some
experimentation, I found that
the Debian package installation technique used by rules_docker is essentially
a tweaked version of the extract-and-copy approach used by the Dockerfile
above, and
There is a major chicken-or-egg problem
in the build process when distroless is used as an external dependency in
an enclosing bazel workspace, requiring manual intervention in the build to
solve.
Using bazel does not resolve a major limitation of using a basic
Dockerfile (i.e. the xcopy style install) and introduces more complexity, in
terms of both build process and tooling, so was rejected in favour of the
Dockerfile approach.
apko: Apko is a tool for quickly building minimalist, reproducible Alpine linux images, using a declarative format. While I found the tool to be very neat, the images it generates are still closer to a "debug" Distroless image.
Using apko would also require us to build an Alpine linux package for Teleport
to integrate it nto the build.
I seriously considered recommending apko, as it has some neat features (e.g.
automatically producting a SBOM as part of the construction process), but in the
end I rejected it because of the extra software included in the resulting images.
A smoke test is a simple, quick test to assert basic functionality. We simply want to find out if Teleport will even start in the environment contained by our container image. While this is not per se a security matter, using a distroless base image means we have very little padding if an unexpected dependency finds its way into Teleport.
We want to stop ourselves shipping garbage to our customers if at all possible.
In order to allow our customers to validate our published Teleport images,
our images will be signed using the cosign tool, similarly to the Distroless
base images.
The cosign tool integrates well with GHA,
and is even included in the template "how to publish a docker image" example
workflow. It also provides "keyless" signing via OIDC, meaning that our build
process can use its GitHub identity to sign the image, obviating the need for
extra keys for us to manage.
After some experimentation it seems that the OIDC Keyless Signing doesn't work with our GitHub Organisation. For this reason, at least for the first iteration of image signing, we will be using the keyed option. This also resolves any question of how much using OIDC-based signing ties us to GitHub for identity.
More information on image signing:
The distroless images will be signed at creation time with an internal key, allowing us to ensure that any images we promote have not been tampered with between creation and promotion. Unfortunately, the image signature leaks the registry and repository names used for the temporary image storage between build and promotion. For this reason, once candidate image's internal signature has been verified, the internal signature will not be copied to the release repository; it will be re-signed with a separate production key.
All Teleport images shall be scanned with trivy immediately after build, and the results will by uploaded to the GitHub Code Scanning service. From there, our Panther SIEM can observe any alerts and instigate corrective action.
We can use the Docker buildx tools to automatically generate and include a
SBOM at build time. Internally, docker buildx uses the syft to generate a
SBOM and attach it to the resulting image. This means that any software
components added to the image must be automatically discoverable by syft (for
example, making sure the package control file is included in /var/lib/dpkg/status.d
for Debian packages).
This process describes how an image is built during a full Teleport release.
graph TD
pull_base[Pull distroless
base image]
signature_valid?{is cosign
signature valid?}
teleport_deb[/Teleport Debian Package
from CI/]
third_party_debs[/Third party Debian packages
from Debian package repository/]
df[/Dockerfile from
Teleport git repository/]
base[/Distroless base image/]
candidate_image[/Candidate Teleport image</br>with SBOM/]
smoke_test[Smoke test to see if
Teleport starts in container image]
smoke_test_pass?{Smoke test
passes?}
push_internal[Push Candidate Image
to private ECR]
trivy_scan[Scan Candidate Image
with trivy]
build[$ docker build ...]
upload_results[Upload scan results
to GitHub Code Scanning]
fail_build[Fail the build]
sign_image[Sign Candidate Image
with Internal key]
promote_build[/Release engineer
promotes build/]
promote_image[Candidate Image with SBOM
copied to Public ECR]
build_ok?{Does all go well
with the rest of
the build?}
sign_build[Sign Candidate Image
with Release key]
pull_base --> signature_valid?
signature_valid? -- yes --> base
signature_valid? -- no --> fail_build
teleport_deb --> build
third_party_debs --> build
df --> build
base --> build
build --> candidate_image
candidate_image --> push_internal
push_internal --> sign_image
sign_image --> trivy_scan
trivy_scan --> upload_results
upload_results --> smoke_test
smoke_test --> smoke_test_pass?
smoke_test_pass? -- yes --> build_ok?
smoke_test_pass? -- no --> fail_build
upload_results --> code_scanning_alerts[(GitHub Code
Scanning alerts)]
code_scanning_alerts --> panther[(Panther SIEM)]
build_ok? -- no --> fail_build
build_ok? -- yes --> promote_build
promote_build --> promote_image
promote_image --> sign_build
Troubleshooting a distroless image is hard, as there are no tools baked into the image to aid in debugging a deployment.
In addition to the main distroless image sets, the distroless team also supplies
a debug-tagged image that includes busybox (that is, a basic shell and some
utilities). In order to provide tooling for troubleshooting a Teleport installation,
we will create a parallel teleport-debug image, based on a distroless debug image.
While we should take as much care as possible when constructing and monitoring this image, use of the debug image should probably be considered "at your own risk".
We have clients relying on the existing behaviour (and contents) of our images. We should treat releasing these distroless images as a compatibility break, and make our customers aware of our intentions well in advance so that they can prepare.
We are already using the scanning tools built-in to AWS ECR, which periodically scans our images and results of the scan into our Panther SIEM instance.
To increase the size and quality of the vulnerability database we scan with, we should
also use trivy for ongoing scans of our published (i.e. public) images, in addition
to the build-time scans described above.
Trivy's in a GHA integration should make it straightforward to run workflow to repeatedly scan our published images on a regular schedule.
The output of the scan will be injected into the GitHub code scanning system. From there the alerts can be picked up buy our Panther SIEM, which already integrates into our GitHub account.
Once the alert data is aggregated into Panther, we can configure events to lodge GitHub issues and/or alert the development team via Slack or e-mail. The development team will be expected to resolve the issues as per the Teleport Vulnerability Management Policy.
Our current process rebuilds the docker images for the latest release of in a major version series once a day, using the same sources and build artifacts used in the original published release. This is to ensure that any updates to the underlying base image are quickly and automatically integrated into the released image.
We should continue rebuilding these images daily, but incorporate the same
supply chain checks as with the release builds described above (e.g. verifying
cosign signatures, generating SBOMs, etc)