rfd/0238-delegating-access-to-ai-workloads.md
Enabling a user to "lend" their access to a remote AI workload, to allow it to autonomously take actions on the user's behalf. Attributing these actions to both the user and workload in the audit log.
The following are out of scope for this RFD:
Teleport currently provides a clear separation between human and non-human identity and access.
With human access, the user logs in and accesses resources (databases, remote desktops, and now MCP servers) from their own device, and actions are attributed to them personally. This suits local clients such as Claude Desktop or Cursor well, but is not an option for agents that are running anywhere other than the end-user's own device.
With non-human access, an automated process (bot) accesses resources from a remote server or CI/CD platform, and actions are attributed to the bot rather than any particular end-user. This suits fully-autonomous AI workloads, but isn't appropriate for workloads that can be controlled by an end-user as it creates a surface for privilege escalation attacks (e.g. talking the bot into doing something you don't personally have permission to do).
There is an emerging need for a third category: autonomous workloads, running remotely, taking actions on an end-user's behalf. Our design partner has such a use-case where they want employees to be able to create a "Digital Twin" to complete currently-manual "ClickOps" tasks for them. Employees would run a CLI command or click a button on a website and an ephemeral "Digital Twin" would be provisioned in their data center, taking on a subset of the employee's privileges to complete the given task.
Our design partner has shared the following constraints and guidelines:
Our design goals:
Users will be able to create a "Delegation Session" which allows either a Teleport Bot or (eventually) a SPIFFE Workload Identity to access certain chosen resources on their behalf, for a limited amount of time.
The session will not carry over the user's standing privileges. Instead, it will be a "blank slate" (deny all by default) to which you can add access to specific resources using the identifier format introduced by Resource Access Requests.
Example:
/test.teleport.sh/mcp/mcp-server-1/tools/read_user_profile
In the first instance, it will not be possible to create a session with privileges greater than the user has themselves, but in the future we could integrate with Access Requests to automatically request any missing privileges.
Bots or Workloads will prove their identity to Teleport and receive short-lived x509 certificates that can be used to authenticate with the Teleport Proxy. These certificates will contain the user identity, bot/workload identity, and the associated session identifier.
Users will have the option to terminate a session, and immediately remove the
bot/workload's ability to act on their behalf, using tctl or the Web UI. They
will also be able to see the audit log of actions taken on their behalf during
a given session.
Given the rapidly growing capabilities and changing architectural patterns in this space, it's likely that over time a "Digital Twin"'s access requirements will change, and its implementation will be split into many smaller workloads or "sub-agents".
It may therefore become too unwieldy for the client to have to enumerate all of the agent's required resources and workload identities (e.g. SPIFFE IDs) when creating a session.
Instead, we will allow the workload "owner" to maintain a "Delegation Profile", conceptually similar to registering an OAuth2 application with Google or GitHub. This profile will also give the workload owner an opportunity to customize the messaging on the consent screen.
Example:
kind: delegation_profile
version: v1
metadata:
name: onboarding-agent
labels:
team: customer-operations
spec:
required_resources:
- "/test.teleport.sh/mcp/mcp-server-1/tools/*"
- "/test.teleport.sh/mcp/mcp-server-2/tools/write_user_profile"
authorized_users:
- type: bot
bot_name: onboarding-agent
- type: bot
bot_name: account-creator
- type: spiffe-workload
spiffe_id: "spiffe://test.teleport.sh/sub-agent/billing-scheduler"
consent:
title: "Onboarding Agent"
description: "This agent is responsible for creating the user's account, scheduling billing, etc."
allowed_redirect_urls:
- "https://my.internal.site/digital-twin/callback"
default_session_length: 8h
Delegation profiles will be little more than sets of pre-selected values for the "required resources" and "authorized users" properties of a session. They will not implement any access controls or give the user access to resources they cannot already access.
They will however still be subject to basic label-based RBAC, so a user will
need a role with matching allow.delegation_profile_labels in order to use a
profile, but the user could still delegate access to the same resources by
manually providing session parameters instead of using the profile.
# Allow the user to use the profile.
kind: role
version: v8
metadata:
name: profile-user
spec:
allow:
delegation_profile_labels:
team: customer-operations
You will also be able to use standard RBAC resource rules to control who can manage the profile.
# Allow the workload owner to manage the profile.
kind: role
version: v8
metadata:
name: workload-owner
spec:
allow:
rules:
- resources: [delegation_profile]
verbs: ['*']
where: 'contains(user.spec.traits["teams"] , labels["team"])'
We do not currently plan to restrict access to the delegation feature using roles. See "Abandoned Ideas" for more information.
This section details the experience an end-user will have when delegating their access to an AI workload.
tshIn the simplest case, if the user has tsh installed and configured, they could
create a session like so:
$ tsh delegate-access \
--bot "onboarding-agent" \
--resource "/test.teleport.sh/mcp/mcp-server-1/tools/.*" \
--resource "/test.teleport.sh/mcp/mcp-server-2/tools/write_user_profile" \
--ttl 8h \
--output json
STDERR: Tap any security key to approve
{"session_id": "01188ff4-f686-4d43-87ae-1cd873472c01"}
Or:
$ tsh delegate-access \
--profile "onboarding-agent" \
--output json
tsh will call the auth server to create the session. It will require MFA,
and if Device Trust is configured, tsh to be running on an enrolled device.
It will then print out the Session ID to be passed on to the workload.
The --output json flag will enable you to call tsh as a sub-process from a
CLI or shell script and easily parse the output.
If the user does not have tsh installed, or if they are delegating access from
a web application, we will expose a route in the Teleport Web UI for creating
sessions.
Our design partner's CLI or web application will redirect the user to a
well-known URL with querystring parameters including the profile or
bot/spiffe_id, resources, and ttl, as well as a redirect_url.
https://<TELEPORT PROXY>/web/delegation/new-session?profile=<PROFILE>&redirect_url=<REDIRECT URL>&state=<STATE>
Or:
https://<TELEPORT PROXY>/web/delegation/new-session?bot=<BOT NAME>&resources=<RESOURCES>&ttl=<TTL>&redirect_url=<REDIRECT URL>&state=<STATE>
This page will explicitly ask for the user's consent, enforce any MFA or Device
Trust requirements, create the session, and redirect the user back to the given
redirect_url. If the user confirmed their consent, the Session ID will be
passed back in the session_id querystring parameter.
Note: Although the Session ID itself is not a secret (the workload must authenticate with Teleport in order to use it), for CLI or SPA clients, we may still need to implement something like PKCE to prevent an attacker from intercepting and using it to trick the (authenticated) workload into doing something unintended - see Security Considerations.
This section details the experience an agent developer will have when accessing MCP servers and other resources on behalf of the end-user.
tbotRFD 0028e discusses how tbot could be extended to support proxying of MCP
traffic, allowing developers to use the remote MCP server as if it were running
on localhost without authentication. We will build on this to allow you to use
a delegation session by passing a Session ID in your tbot configuration:
services:
- type: mcp-tunnel
server: my-mcp-server
listen: tcp://localhost:8080
delegation:
session_id: 01188ff4-f686-4d43-87ae-1cd873472c01
Or for workloads that perform MCP calls on behalf of many end-users, we will extend this proxy to support passing the Session ID as a header in each HTTP request instead.
In Kubernetes, you will deploy a tbot sidecar container in each Digital Twin
pod.
If you are already using Teleport's Workload Identity feature or another SPIFFE identity provider, we could allow your workloads to authenticate with Teleport using their x509 SVID as an mTLS client certificate.
This would provide a feature, conceptually similar to AWS Roles Anywhere, where you could in-effect "exchange" your workload identity for session credentials.
For workloads written in Go, we could have our SDK expose an http.RoundTripper
that does this transparently for you:
svidSource, err := workloadidentity.NewX509Source(ctx)
if err != nil {
return err
}
transport, err := client.NewDelegatedAccessTransport(ctx, svidSource, client.DelegatedAccessConfig{
ProxyAddress: "test.teleport.sh:443",
SessionID: "01a74863-611f-401f-a391-0a5d108f2e1d",
ApplicationName: "mcp-server-1",
})
if err != nil {
return err
}
defer transport.Close()
httpClient := &http.Client{Transport: transport}
As we've found that many other languages offer poorer support for TLS features
such as ALPN, we could also provide a proxy binary that you can run as a sidecar
alongside your workload (similar to tbot) instead:
$ export SPIFFE_ENDPOINT_SOCKET=unix:///path/to/spiffe.sock
$ workload-mcp-proxy \
--listen "tcp://localhost:8080" \
--session-id "01188ff4-f686-4d43-87ae-1cd873472c01" \
--app mcp-server-1 \
--teleport-proxy-addr "test.teleport.sh:443"
As with the tbot example above, the Session ID here is provided as a command
line flag, so the entire proxy would be bound to a single session, but we could
make the proxy "multi tenant" by accepting the Session ID on each request as an
HTTP header.
Although authentication and authorization will be handled entirely by Teleport,
it can be useful for the MCP server to know who the end-user is. As with
Application Access, we will send their username in all HTTP requests proxied to
the MCP server via the Teleport-Jwt-Assertion
header.
It may be desirable to "pin" the session to a specific instance of a Bot. We could support this by either allowing you to specify the instance identifier when creating the session, or by automatically pinning to the first bot instance that attempts to use the session.
When starting a session by redirecting the user to the Teleport Web UI, it would
be prudent to implement a protection like PKCE to
prevent an attacker from intercepting the session_id querystring parameter and
using it to "trick" the authenticated workload into doing something unintended.
The session_id itself is not a secret, and can only be used by the workload or
bot explicitly bound to the session, after they have authenticated with Teleport.
However, if the attacker also has the ability to trigger workloads (e.g. is a
process running on the same host as the CLI with access to make API calls to the
"Digital Twin Orchestrator") they could theoretically use the session_id to
make the (authenticated) workload do something other the user's intended action.
Following PKCE's example, we could have the client generate a secret key ("code verifier") which it hashes (to create a "code challenge") and includes as a querystring parameter when redirecting to Teleport. Then, the session would only be usable if you are also in possession of the original secret ("code verifier").
An attacker could still theoretically intercept the call to the orchestrator and
steal both the session_id and "code verifier" but it's far less likely.
sequenceDiagram
participant Workload
participant tbot
participant Auth as Teleport Proxy / Auth Server
participant MCP as MCP Server
Note over Workload, MCP: Obtaining Session Certificates
tbot->>Auth: GenerateDelegationCerts(session_id="...")
Auth->>Auth: Validate bot identity
& session binding
Auth-->>tbot: Issue short-lived certificates
Note over Workload, MCP: MCP Server Access
Workload->>tbot: Make MCP request to localhost proxy
tbot->>MCP: Proxy request via Teleport Proxy
MCP-->>tbot: MCP response
tbot-->>Workload: Forward response
sequenceDiagram
actor User
participant App as CLI/Web App
participant Web as Teleport Web UI
participant Proxy as Teleport Proxy / Auth Server
participant Orchestrator as Workload Orchestrator
User->>App: Request Digital Twin
App-->>User: Redirect to Teleport
User->>Web: Visit consent screen
Web->>Proxy: Read profile details
Proxy-->>Web:
User->>Web: Give consent (and MFA, etc.)
Web->>Proxy: Create session
Web-->>User: Redirect with Session ID
User->>App: Visit callback URL
App->>Orchestrator: Create Digital Twin (with Session ID)
This section covers the implementation of the first version (delegating a subset of user privileges only). It does not go into detail about how we will later integrate with Access Requests to support privilege escalation.
Delegation will be built on the foundation established by Resource Access
Requests. This feature works by extending user certificates with a list of
AllowedResourceIDs which narrows down the access provided by roles to just
a specific set of resources.
Resource IDs are path-like strings in the form: /TELEPORT_CLUSTER/KIND/NAME/SUB_RESOURCE_NAME.
They typically refer to a single resource (e.g. node, database, or application)
but can refer to many Kubernetes resources using wildcards or regular
expressions.
Examples:
/test.teleport.sh/node/b1168402-9340-421a-a344-af66a6675738
/test.teleport.sh/app/my-app
/test.teleport.sh/kube:ns:pods/mycluster/*/^nginx-[a-z0-9-]+$
On their own, AllowedResourceIDs do not give you access to the resources, they
simply act as a filter, removing unneeded ambient privileges provided by the
certificate's role set. When creating a Resource Access Request, the auth server
determines which roles, available in the user's search_as_roles, would provide
access to the resources, and should be requested.
Delegation session certificates will include all of the user's static roles, and
use the AllowedResourceIDs field to "jail" the session to permitted resources.
We do not strictly need to check the user has permission to access the requested resources when creating a delegation session, as this will be enforced at time-of-use by the RBAC engine. It's also difficult to know this ahead of time for Kubernetes resources, as the Resource IDs can contain wildcards and regular expressions. However, for a better user experience, we will check if the user at least has access to the Teleport resource (i.e. the Kubernetes cluster), so we can surface obvious permission issues while the end-user is still "in the loop".
It is currently only possible to request access to an MCP Server "application"
as a whole, not specific tools or resources. We will either follow the example
set by Kubernetes resources by using the SUB_RESOURCE_NAME section of the
Resource ID to allow you to target tools and resources using a regular expression:
/test.teleport.sh/mcp/mcp-server-1/tools/read_(.*)
Or if the projects' respective timelines allow, we'll integrate with the scoping mechanism discussed in RFD 0228: Resource-Scoped Constraints in Access Requests.
In order for the user to quickly revoke the AI agent's access if they discover it is misbehaving, we will allow them to "terminate" the session. Under the hood, this will create a lock which targets the Delegation Session ID.
Teleport already supports taking on somebody else's identity via
impersonation,
which is conceptually similar to the access delegation described in this RFD.
It's also the underlying mechanism by which tbot avoids its renewable internal
identity certificate having any privileges of its own, making it less useful to
an attacker.
While the core "plumbing" of impersonation, such as encoding the impersonator
into a certificate extension and including it in audit logs, is useful - we will
not reuse the actual implementation in the GenerateUserCerts RPC.
This RPC is both highly security-sensitive and already very complex because it
handles many other concerns related to generating user certificates, which are
not relevant to our use-case. Instead, we will introduce a new RPC which calls
the same lower-level Server.generateUserCert method directly.
The gRPC service for managing delegation profiles will contain the usual endpoints for creating, reading, updating, and deleting resources. For brevity, these are not discussed in detail here.
The GetDelegationProfile endpoint will accept an authorization_mode
parameter, which controls whether we authorize the caller as an end-user trying
to "use" the profile (i.e. check they have access to the profile's resources),
or as an administrator viewing the profile details.
CreateDelegationSessionThis endpoint will be called when a user clicks the "Authorize" button on the
consent screen, or runs tsh delegate-access on the command line. It will be
exposed to the Web UI via HTTP at: POST /webapi/sites/:site/delegation/sessions.
Similar to the GetDelegationProfile endpoint, it will perform a best-effort
check that the user has access to the requested resources via their static roles,
to surface permission issues early.
Given the security-sensitive nature of delegation, we will use the
AuthorizeAdminAction method to prompt the user for MFA before creating the
session.
As well as the session parameters (i.e. profile name, or required resources)
this endpoint will accept the consent screen redirect_url, state, and (PKCE
inspired) challenge. It will validate the redirect_url against the profile's
allow-list and store the challenge so it can later be compared to the verifier
parameter in the GenerateDelegationCerts RPC.
message CreateDelegationSessionRequest {
oneof from {
// Profile the session will be created from.
string profile_name = 1;
// Parameters the session will be created from, when manually creating the
// session using `tsh delegate-access --resource XXX`.
DelegationSessionParameters parameters = 2;
}
}
message DelegationSessionParameters {
// Resources that are required by users of this session.
repeated ResourceID required_resources = 1;
// Users (i.e. bots or workloads) that will be allowed to use this session.
repeated AllowedUser allowed_users = 2;
// How long the session will last.
google.protobuf.Duration session_length = 3;
// Consent screen parameters (e.g. redirect_url).
optional ConsentScreenParameters consent_screen = 3;
}
message ConsentScreenParameters {
// Callback URL, where the user should be redirected after the session is
// created. Will be checked against the profile's allow-list.
string redirect_url = 1;
// Used to pass state (e.g. CSRF tokens) between consent screen and callback.
optional string state = 2;
// Inspired by PKCE's `code_challenge`. Will be compared to the
// GenerateUserCerts's hashed `verifier` parameter.
optional string challenge = 3;
}
We will log an delegation.session.create audit event after the session has
been created.
TerminateDelegationSessionThis endpoint will simply check the given session belongs to the caller, and create a lock targeting the Delegation Session ID. The lock object will expire at the same time as (or shortly after) the session.
We will log an delegation.session.terminate audit event after the lock has
been created.
GenerateDelegationCertsThis endpoint will be called by tbot (or perhaps eventually a different SPIFFE
authenticated client) to obtain x509 and SSH certificates that can be used to
access resources, on the user's behalf, via the Teleport proxy.
It will authorize the request by comparing the caller's bot name (taken from
their x509 certificate) to the session's allowed_users.
For ease of use by existing clients (i.e. tbot) it will accept many of the
same parameters as the auth server's GenerateUserCerts RPC, including the
RouteToApp, RouteToDatabase, etc. although this may not be necessary for
MCP access. It will also accept the Delegation Session ID.
If the session creator provided a PKCE-style "challenge", the caller of this
endpoint must provide the matching "verifier" otherwise the request will be
rejected with a PermissionDenied error.
We will load the end-user's static roles afresh from the backend, so that if any role has been removed, it will be reflected it the delegation certificate.
If the caller provided a RouteToApp we will create an application/web session
using the Server.CreateAppSessionFromReq method, which is a pre-requisite to
using Application Access.
The resulting x509 and SSH certificates will contain the end-user's username, static roles, the bot name and instance identifier, allowed resources, and the delegation session identifier.
It is not yet clear whether we should also set the "impersonator" field because it may have unexpected authorization semantics.
tbot changesMany of tbot's services request a certificate as the first step of generating
an output or initializing a proxy. They do so by using the identity.Generator
type, which accepts a set of GenerateOption parameters, used to configure
various aspects of the certificate (e.g. roles, or cluster name).
We will add a new option that, when provided, causes the generator to use the
new GenerateDelegationCerts RPC.
// WithDelegation uses the given Delegation Session ID and verifier to access
// resources on an end-user's behalf, rather than the bot itself. This option is
// mutually-exclusive with `WithRoles` because it is not possible to customize a
// delegation session's roles.
func WithDelegation(sessionID string, challengeVerifier string) GenerateOption
We will extend relevant audit events to include a DelegationMetadata field
which is populated with data from the certificate.
message DelegationMetadata {
string SessionID = 1 [(gogoproto.jsontag) = "delegation_session_id"];
}
search_as_roles)An earlier version of this RFD included a role option called delegation_allowed
which could be used to control which users are permitted to use the delegation
feature at all. We decided to remove this from scope because it introduces more
subtle complexity than value.
For example, an administrator might reasonably assume that delegation_allowed
controls whether the role itself can be delegated. If we supported this feature,
we'd need to decide how to safely "drop" roles that cannot be delegated, which
is hugely risky due to privilege-restricting role features such as deny rules.
Other potential solutions include requiring that all of a user's roles allow delegation (i.e. least permissive value wins), or allowing the administrator to mark a role as "safely droppable".
However, because delegation doesn't fundamentally give the user access they do not already have, and is more of a convenient way to share a limited subset of their privileges, we do not feel it warrants its own access control.
We may still want to enable administrators to enable/disable delegation as a
cluster-wide setting in ClusterAuthPreference, though.