rfd/0166-tpm-joining.md
Introduce a new join method that allows a Bot or Agent running on a host with a TPM to securely join a Teleport cluster without using a shared secret.
Bootstrapping trust with a newly provisioned host in an on-premises environment is challenging. In many environments, this is done by transferring an initial shared secret to the host, which can then be used for authentication. However, this is difficult to complete securely - especially at scale. This is due to the risk of impersonation/interception, an attacker who has sufficiently compromised the network can impersonate a newly provisioned host and receive access to this secret. In addition, a shared secret on the host is liable to exfiltration.
A TPM provides a secure, unique and persistent initial identity ideal for bootstrapping trust with a host. Even across reboots, or reconfigurations, the TPM identity remains the same. The guarantees provided by a compliant TPM mean that this identity cannot be exfiltrated. This makes it a strong candidate for bootstrapping trust.
Reference materials:
Existing code and logic from Device Trust TPM support will be reused where appropriate for this feature.
A compliant TPM provides certain guarantees that will be relevant to TPM joining:
It is important to note that these guarantees do not exist if the TPM is not compliant or the host has been compromised in such a way that some malicious software is intercepting commands intended for the TPM. This is mitigated by validating the EKPub and EKCert against the manufacturer's CA in ceremonies such as Credential Activation, proving that the TPM is legitimate and that you are talking directly to it.
Before joining is possible, the user must configure the Auth Server with a join token which allows that TPM to join. The Join Token will contain the EKPub hashes and EKCert serials which are allowed to join.
To assist in this process, a new command will be added
tbot tpm identify/teleport tpm identify:
tbot tpm identify on the host. This will query the EKPub and
EKCert from the TPM attached to the host and output these values for the user.tpm join token to include the EKPub
or EkCert serial of the TPM within the allow list. See the API Changes section
for the structure of this.The join process is roughly based on the TCG TPM 2.0 Keys for Device Identity and Attestation recommended "Identity Provisioning" flow. As per this document, the flow is designed to assure that:
sequenceDiagram
participant T as TPM
participant B as Bot
participant A as Auth Server
B->>T: Queries EK and EKCert
T-->>B: EK, EK handle, EKCert
B->>T: Request AK generation
T-->>B: AKPub, AK handle
B->>A: Starts RegisterUsingTPM RPC stream
B->>A: Sends EKPub, EKCert and AKPub
opt TPM CA configured for Join Token
A->>A: EKCert signature validated against CA
end
A->>A: Validates EKPub hash, EKCert serial against allow rules in Join Token
A->>A: Generates Credential Activation challenge
A-->>B: Sends Credential Activation challenge
B->>T: ActivateCredential(EK Handle, AK Handle, Challenge)
T->>T: Solves challenge
T-->>B: Credential Activation solution
B-->>A: Credential Activation solution
A->>A: Validates submitted solution against the known solution
A->>B: Signed TLS and SSH Certificates
At the core of this process is the Credential Activation ceremony. This proves possession of the EK and AK to the Auth Server through decrypting an encrypted challenge. As long as the TPM is authentic (e.g the EKCert is signed by a trusted manufacturer), this ceremony also proves that the EK and AK are resident in the same TPM and cannot be exported.
As with all delegated join methods, the renewal will simply repeat the join process.
A new JoinMethod will be added - tpm.
The existing JoinToken type will be extended to include additional fields for
TPM joining:
// ProvisionTokenSpecV2TPM contains the TPM-specific part of the
// ProvisionTokenSpecV2
message ProvisionTokenSpecV2TPM {
message Rule {
// Description is a human-readable hint for the rule. This does not impact
// joining but can be used for associating a business name with an EKPub or
// EKCert serial.
// Example: "build-server-100"
string Description = 1 [(gogoproto.jsontag) = "description,omitempty"];
oneof Identifier {
// EKPubHash is the SHA256 hash of the EKPub marshaled in PKIX format
// and encoded in hexadecimal.
// Example: d4b45864d9d6fabfc568d74f26c35ababde2105337d7af9a6605e1c56c891aa6
string EKPubHash = 2 [(gogoproto.jsontag) = "ekpub_hash,omitempty"];
// EKCertSerial is the serial number of the EKCert in hexadecimal with
// colon separated nibbles.
// Example: 73:df:dc:bd:af:ef:8a:d8:15:2e:96:71:7a:3e:7f:a4
string EKCertSerial = 3 [(gogoproto.jsontag) = "ekcert_serial,omitempty"];
}
}
// Allow is a list of TokenRules, nodes using this token must match one
// allow rule to use this token.
repeated Rule Allow = 1 [(gogoproto.jsontag) = "allow,omitempty"];
// EKCertAllowCA is a EKCert CA in PEM format.
// If present, only TPM devices that present an EKCert that is signed by the
// CA specified here may join.
//
// If not present, then the CA of TPM EKCerts will not be checked during
// joining.
bytes EKCertAllowedCA = 2 [(gogoproto.jsontag) = "ekcert_allowed_ca,omitempty"];
}
In YAML, this would look like:
kind: token
version: v2
metadata:
name: my-bot-token
spec:
roles: ["Bot"]
bot_name: my-bot
join_method: tpm
tpm:
ekcert_allowed_ca: |
-----BEGIN CERTIFICATE-----
... CA obtained from manufacturer ...
-----END CERTIFICATE-----
allow:
- hint: "kansas-build-server-100"
ekpub_hash: d4b45864d9d6fabfc568d74f26c35ababde2105337d7af9a6605e1c56c891aa6
- hint: "london-build-server-42"
ekcert_serial: 73:df:dc:bd:af:ef:8a:d8:15:2e:96:71:7a:3e:7f:a4
RegisterUsingTPMMethod RPCA new streaming RPC will be added to the existing JoinService:
service JoinService {
// .. Existing Methods ..
// RegisterUsingTPMMethod is used to register a Bot or Agent using a TPM.
rpc RegisterUsingTPMMethod(stream RegisterUsingTPMMethodRequest) returns (stream RegisterUsingTPMMethodResponse);
}
// The enrollment challenge response containing the solution returned by
// calling the TPM2.0 `ActivateCredential` command on the client with the
// parameters provided in `RegisterUsingTPMMethodEnrollChallengeRequest`.
message RegisterUsingTPMMethodEnrollChallengeResponse {
// The client's solution to `TPMEncryptedCredential` included in
// `RegisterUsingTPMMethodEnrollChallengeRequest` using ActivateCredential.
bytes solution = 1;
}
// The attestation key and the parameters necessary to remotely verify it as
// related to the endorsement key.
// See https://pkg.go.dev/github.com/google/go-attestation/attest#AttestationParameters.
// This message excludes the `UseTCSDActivationFormat` field from the link above
// as it is TMP 1.x specific and always false.
message TPMAttestationParameters {
// The encoded TPMT_PUBLIC structure containing the attestation public key
// and signing parameters.
bytes public = 1;
// The properties of the attestation key, encoded as a TPMS_CREATION_DATA
// structure.
bytes create_data = 2;
// An assertion as to the details of the key, encoded as a TPMS_ATTEST
// structure.
bytes create_attestation = 3;
// A signature of create_attestation, encoded as a TPMT_SIGNATURE structure.
bytes create_signature = 4;
}
// The initial information sent from the client to the server.
message RegisterUsingTPMMethodInitialRequest {
// Holds the registration parameters shared by all join methods.
types.RegisterUsingTokenRequest join_request = 1;
oneof ek {
// The device's endorsement certificate in X509, ASN.1 DER form. This
// certificate contains the public key of the endorsement key. This is
// preferred to ek_key.
bytes ek_cert = 2;
// The device's public endorsement key in PKIX, ASN.1 DER form. This is
// used when a TPM does not contain any endorsement certificates.
bytes ek_key = 3;
}
// The attestation key and the parameters necessary to remotely verify it as
// related to the endorsement key.
TPMAttestationParameters attestation_params = 4;
}
message RegisterUsingTPMMethodRequest {
oneof payload {
// Initial information sent from the client to the server.
RegisterUsingTPMMethodInitialRequest init = 1;
// The challenge response required to complete the TPM join process. This is
// sent in response to the servers challenge.
RegisterUsingTPMMethodEnrollChallengeResponse challenge_response = 2;
}
}
// These values are used by the TPM2.0 `ActivateCredential` command to produce
// the solution which proves possession of the EK and AK.
//
// For a more in-depth description see:
// - https://pkg.go.dev/github.com/google/go-attestation/attest#EncryptedCredential
// - https://trustedcomputinggroup.org/wp-content/uploads/TCG_TPM2_r1p59_Part3_Commands_code_pub.pdf (Heading 12.5.1 "TPM2_ActivateCredential" "General Description")
// - https://github.com/google/go-attestation/blob/v0.4.3/attest/activation.go#L199
// - https://github.com/google/go-tpm/blob/v0.3.3/tpm2/credactivation/credential_activation.go#L61
message RegisterUsingTPMMethodEnrollChallengeRequest {
// The `credential_blob` parameter to be used with the `ActivateCredential`
// command. This is used with the decrypted value of `secret` in a
// cryptographic process to decrypt the solution.
bytes credential_blob = 1;
// The `secret` parameter to be used with `ActivateCredential`. This is a
// seed which can be decrypted with the EK. The decrypted seed is then used
// when decrypting `credential_blob`.
bytes secret = 2;
}
message RegisterUsingTPMMethodResponse {
oneof payload {
// The challenge required to complete the TPM join process. This is sent to
// the client in response to the initial request.
RegisterUsingTPMMethodEnrollChallengeRequest challenge_request = 1;
// The signed certificates resulting from the join process.
Certs certs = 2;
}
}
vTPM (virtual TPM) refers to software which emulates the functionality of a TPM, without necessarily being backed by a physical TPM.
These come in multiple forms:
vTPMs expose the same interface as a physical TPM and for our purposes are effectively indistinguishable. It may be possible to identify a vTPM by the lack of an EKCert or an EKCert signed by a CA that is not a TPM manufacturer's. This difficulty ultimately means it would be fruitless to attempt to do so.
vTPMs may not offer the same guarantees as a physical TPM, this varies depending on the implementation. For example, a simulator TPM intended for use in testing is likely to offer none of the guarantees. On the other hand, a vTPM offered by a hypervisor may offer the same guarantees as a physical TPM, unless the hypervisor has been compromised. Operators must fully understand the limitations of their chosen implementation.
Regardless, for the purposes of TPM joining, we do not need to distinguish or change behaviour based on the "realness" of the TPM. It is ultimately the responsibility of the operator to ensure that their TPM implementation meets their security requirements.
The TPM join method will be restricted to Enterprise/Cloud licensed clusters.
This should be enforced by the Auth Server at join time.
Eventually, we can use the TPM to perform a platform attestation to inspect the PCR values and event log during the join process. These values can be logged and rules can be put in place to block joins where a host's state sufficiently deviates from the known-good and expected state.
This work is deferred for now as platform attestation is a complex topic and greatly increases the complexity of the implementation and also the complexity of using the feature. This can be revisited once there is a clearly defined use-case and demonstrated demand.
In addition to using the TPM for the join process, we can also use this to generate and store the Bot's private key material. As the TPM can be used to perform cryptographic operations without exposing the keys, this would provide a strong guarantee that the private key material cannot be exfiltrated.
See https://github.com/gravitational/teleport/issues/21555 for tracking of this improvement.
If the TPM manufacturer's CA is compromised, we can no longer trust that the TPM we are talking to is authentic during the enrolment process. These CAs are typically well protected, making this a complex attack.
In addition, merely compromising the CA is not all that useful alone. The attacker would also need to either:
In the case a compromise is noticed, the manufacturer can revoke the CA and distribute new EKCerts for effected TPMs. Users would need to update the configured EKCert CAs within their Teleport configuration and at this point the attacker would lose the ability to impersonate TPMs.
If the TPM design itself is compromised, the guarantees provided by the TPM are lost. An attacker with access to the host would be able to extract the sensitive materials from the TPM and impersonate it at will.
Realistically, this is an extremely complex attack. TPMs are designed, reviewed and tested to reduce the risk of this happening. In addition, the attacker would still need to compromise the host in order to extract the sensitive materials.
However, it's worth recognizing that this is not impossible. In 2023, CVE-2023-1017 and CVE-2023-1018 were published, which describe a vulnerability in a version reference implementation code provided by the TCG for TPM vendors. A firmware patch was released for the affected TPMs.
Using other Teleport features, such as access monitoring and IP pinning, can help mitigate this risk. Allowing compromise to be detected or the difficulty of the attack to be increased.
Once joined, the signed Bot certificates and private keys are stored on disk or in memory. Here, they can be exfiltrated by an attacker with sufficient access to the host.
This risk is reduced by using a short TTL for the certificates. This limits the amount of time an exfiltrated certificate can be used. In addition, preferring the in-memory storage further reduces the risk as higher privileges are typically required to read the memory of a process.
To mitigate this entirely, we should implement "Storing Bot private key material in the TPM" as described in the "Future Improvements" section.
See https://trustedcomputinggroup.org/wp-content/uploads/TCG_FIPS_140_Guidance_for_TPM2_0_v1r1_20170202.pdf for the official guidance on the compatibility of TPM 2.0 with FIPS 140.
The initial implementation of TPM joining will not be FIPS 140 compatible and we should ensure the documentation is clear on this.
To make TPM joining FIPS 140 compatible, the following must be completed:
Due to non-compliance, we will reject TPM joining when the Auth Server is configured in FIPS mode.
One common theme across this RFD is the extent to which the operator is responsible for:
We should ensure that documentation is clear on these two points as a pre-requisite of using the vTPM in a production environment.
Whether the join was successful or not, the Auth Server should omit the
bot.join audit event for the join attempt. This should include, in addition
to the already standard fields, the following: