rfd/0192-oci-join.md
Add the ability for Teleport agents running on Oracle Cloud instances to join a cluster without a static token.
This feature removes the need to use a shared secret to establish trust between a Teleport cluster and an Oracle Cloud compute instance.
Suppose Alice is a system administrator with a Teleport cluster, and she wants
to add some Oracle Cloud compute instances to it. She
would first create the file token.yaml with the following contents:
# token.yaml
kind: token
version: v2
metadata:
name: oci-token
spec:
roles: [Node]
oracle:
allow:
- tenancy: "ocid1.tenancy.oc1..<unique ID>" # the OCID for Alice's tenancy
parent_compartments: ["ocid1.compartment.oc1..<unique ID>"] # the OCID for Alice's compartment
# If needed, Alice can further restrict the compartments and regions
# instances can join from.
She would then create the provision token:
$ tctl create token.yaml
Next, Alice would install, configure, and start Teleport on all of her instances.
If Alice has not yet created her instances, she can set user_data in each
instance's metadata to add an init script for
(cloud-init)[https://docs.oracle.com/en-us/iaas/Content/Compute/Tasks/launchinginstance.htm#].
Otherwise, she can run the following script locally to install Teleport on her
existing instances:
$ INSTANCE_IDS=$(oci compute instance list | jq -r '.data | map(.id) | join(" ")') # filter instances as needed
$ for INSTANCE_ID in $(echo $INSTANCE_IDS)
do
oci instance-agent command create \
--compartment-id <compartment-id> \
--content '{"source": {"sourceType": "TEXT", "text": "curl https://cdn.teleport.dev/install.sh | bash -s <Teleport version> && \
teleport node configure --token oci-token --join-method oracle --proxy example.com && \
sudo systemctl start teleport"}}' \
--target '{"instanceId": "$INSTANCE_ID"}'
done
She can confirm that the nodes have joined either in the web UI or by running tsh ls.
The provision token will be extended to include a new oracle section:
kind: token
version: v2
metadata:
name: example-oci-token
spec:
roles: [Node, Kube, Db]
oracle:
allow:
# OCID of the tenancy to allow instances to join from. Required.
- tenancy: "ocid1.tenancy.oc1..<unique ID>"
# OCIDs of compartments to allow instances to join from. Only the direct parent
# compartment applies; i.e. nested compartments are not taken into account.
# If compartments is empty,
# instances can join from any compartment in the tenancy.
parent_compartments: ["ocid1.compartment.oc1...<unique_ID>"]
# Regions to allow instances to join from. Both full names ("us-phoenix-1")
# and abbreviations ("phx") are allowed. If regions is empty, instances can join from any region.
regions: ["phx", "us-ashburn-1"]
# Add more entries as necessary.
- tenancy: "..."
parent_compartments: ["foo", "bar"]
regions: ["baz", "quux"]
# ...
Before the join process can begin, the nodes joining need permission to authenticate clients.
Allow dynamic-group <dynamic-group-name> to inspect authentication in tenancyAs long as the criteria for which instances can join doesn't change, the group and policy do not need to be updated for each new instance. We will recommend that users configure the dynamic group matching rules to match their provision token to limit unnecessary permissions.
When a node initiates the Oracle join method:
RegisterUsingOracleMethod grpc request to the auth server.http://127.0.0.1. The address doesn't matter as the node will only use the
signed headers and never make the request.https://auth.{region}.oraclecloud.com/v1/authentication/authenticateClient,
and include the signed headers from the previous request as the payload (the
authenticateClient route is not documented in the Oracle docs, but the
request
and response
types are).
The node will include and sign
the challenge from the auth server under the header x-teleport-challenge.authenticateClient URL (found in the keyID field of the
Authorization header, formatted as a JWT) and forwards the request to the
Oracle API and verifies that the request succeeds.opc-tenant, opc-compartment, and opc-instance
from the authenticateClient response to the instance's tenancy ID, compartment
ID, and instance ID respectively.If the auth server is ever
throttled by Oracle,
the TooManyRequests error will be propagated back to the node, which will try
RegisterUsingOracleMethod again with the exponential backoff recommended by
Oracle (maximum of 60 seconds).
The Oracle provision tokens will not support nested compartments, i.e. if
compartment foo has a child compartment bar and the provision token has
parent_compartments: ["foo"], this will not allow instances in container bar to
join. This is for simplicity's sake; Teleport would need to make several
requests to the Oracle Cloud API to walk up the compartment tree from the
compartment the instance is in, each of which would need to be signed. This
would require a complicated back-and-forth between the auth server and the
joining node to get signed requests for each compartment.
To mitigate SSRF, Teleport will verify that the region provided by the joining node is valid.
On top of the signed challenge, both Teleport and the Oracle API will
verify that the X-Date header in the signed request is
within 5 minutes
of their own clocks.
Add RegisterUsingOracleMethod rpc to the join service:
message RegisterUsingOracleMethodRequest {
types.RegisterUsingTokenRequest register_using_token_request = 1;
map<string,string> headers = 2;
}
message RegisterUsingOracleMethodResponse {
oneof Response {
string challenge = 1;
Certs certs = 2;
}
}
service JoinService {
// ...
rpc RegisterUsingOracleMethod(stream RegisterUsingOracleMethodRequest) returns (stream RegisterUsingOracleMethodResponse);
}
Extend provision tokens to include roles for joining Oracle instances:
message ProvisionTokenSpecV2 {
// Existing fields...
ProvisionTokenSpecV2Oracle Oracle = 17;
}
message ProvisionTokenSpecV2Oracle {
message Rule {
string Tenancy = 1;
repeated string ParentCompartments = 2;
repeated string Regions = 3;
}
repeated Rule Allow = 1;
}
Tokens created with the oracle join method and instances joining via Oracle
tokens will be captured by the existing ProvisionTokenCreate and InstanceJoin
events, respectively.
Suppose Oracle join is released in Teleport version X. The expected behavior of agents with mixed versions is as follows:
| Auth <X | Auth >=X | |
|---|---|---|
| Node <X | Irrelevant | Node will not launch with unrecognized join method |
| Node >=X | Join will be rejected for unrecognized join method | Join works |
Add an entry to the test plan to verify that the Oracle join method works as described in the docs, just like the other join methods.
Cluster admins with many Oracle Cloud compartments may wish to specify the
allowed compartments to join from by their tags, rather than having to
specify each by OCID. The oracle section of the provision token
spec could be expended with the compartment_tags field to allow filtering
by defined and/or freeform tags. Since Teleport would already fetch the compartment
from the Oracle API, no extra permissions would be required.
/* spell-checker: disable */
{
"sub": "ocid1.instance.oc1.phx.<random string>",
"opc-certtype": "instance",
"iss": "authService.oracle.com",
"fprint": "<fingerprint>",
"ptype": "instance",
"aud": "oci",
"opc-tag": "V3,ocid1.tenancy.oc1..<random string>,AAAAAQAAAAAAAACB,AAAAAQAAAAAAhy9d",
"ttype": "x509",
"opc-instance": "ocid1.instance.oc1.phx.<random string>",
"exp": 1732738022,
"opc-compartment": "ocid1.compartment.oc1..<random string>",
"iat": 1732736822,
"jti": "<jwt id>",
"tenant": "ocid1.tenancy.oc1..<random string>",
"jwk": "{\"kid\":\"<fingerprint>\",\"n\":\"0BOIi1uIrzoyQmNmfsew8aRv1DVNx979QqD6WoZ37QTDkFuNoGUPssk_mftatqQUGbkppKAtXutb9lXO1SsEnyOv2_tN1KxBhiahtMdRoha0wchla2GJQd7zxVxjSU70ousmuHfIAr29P6jdx3zQ15WYG-MMRcKfB8FtETzEcTBJH9ujjw00LkBmQ_CJsJIq2YFWjp4HW8DlX2YER_FYy7Apq98Rqno0Ze4lBBib-HeJP2x7q0mxJoHEJlsRBdMweMRKhsFL5oKJjWaul06TBp4wuEx7Czcr427d5RZJ-cSCYCDkf8bzMhZ4K5o2cpKV3gcqXEDuH81_B4odZ4-oLQ\",\"e\":\"AQAB\",\"kty\":\"RSA\",\"alg\":\"RS256\",\"use\":\"sig\"}",
"opc-tenant": "ocid1.tenancy.oc1..<random string>"
}
/* spell-checker: enable */