docs/craft/features/external-apps/action-policies.md
Relationship to Approvals. This plan owns the policy layer for external apps: the per-action catalog, the admin-set
ALWAYS | ASK | DENYdecisions, their storage, and the request→decision resolver. It is the external-apps-scoped realization of "Phase 4 — Policy Management" in the Craft Approvals proposal. Enforcement (intercepting the request, holding it, prompting the user) lives in the egress proxy — see Phase 1 — Egress Interception Proxy. This document defines the contract that proxy reads; it does not build the proxy.
A connected external app currently grants the agent its entire capability
surface — all of Slack, all of Linear, etc. This plan lets an admin govern that
at the level of individual actions: for each action a built-in app can take,
choose ALWAYS (auto-approve), ASK (require approval), or DENY (block).
Custom apps get a single blanket policy in v0. Policies are persisted on the
admin-level ExternalApp and exposed through a transport-agnostic read contract
the egress proxy consumes to decide each outbound request.
The design rests on one separation: recognition ("what action is this
request?") is decoupled from decision ("what do we do about it?"), bridged by
a stable action_id. Recognition for built-ins lives in code next to the
provider; decisions live in the DB. That split is what makes the system both
extensible (adding a provider is a code-only change) and safe-by-default
(unrecognized requests fall to a fail-closed default).
POST /graphql). Endpoint-URL matching alone cannot tell a Linear issue read
from an issue delete.call escape hatch, anything else), so classification can only
rely on the request itself.ALWAYS | ASK | DENY per action.ALWAYS | ASK policy covering every request to that app.action_id string is the only
interface between "what is this request" and "what do we do." Recognition logic
never imports the policy enum; policy storage never imports request shapes.ASK approval UX — event shape, hold mechanism, "remember for session",
timeout (the Approvals workstream).DENY-driven bundle filtering of SKILL.md (optional defense-in-depth,
deferrable to enforcement).Two questions, joined only by the action_id:
| Question | Where it lives | |
|---|---|---|
| Recognition | "What action is this request?" → action_id | Built-in: code (matchers on the provider). Custom: data (or nothing in v0). |
| Decision | "What do we do?" → ALWAYS | ASK | DENY | Always the DB, keyed by action_id. |
This is what lets built-in recognition be maintained Python and custom recognition be admin-authored data, with no change to how decisions are stored or resolved.
Shared, provider-agnostic infrastructure (lives under backend/onyx/external_apps/;
a prototype informing these shapes was built in scratch as request_action_parser.py):
Normalize the request to secret-scrubbed facts:
{method, host, path, query, body_type, headers} — Authorization reduced to
{present, scheme}, never raw tokens/cookies.
Parse the GraphQL body when present (operation type + root fields) — the
action is in the body, so /graphql alone is insufficient.
Extract action(s) via the provider's matcher, with a layered fallback so there is always at least one action (never a silent hole):
semantic (slack.channel.read) → generic service (slack.http.post) → generic http (unknown.http.post)
action_ids are hierarchical service.resource.verb; each carries a risk
(read | write | delete), inferred from method (REST) or operation-type +
destructive keyword (GraphQL).
upstream_url_patterns
(the existing seam). Multiple matches = misconfiguration; resolve
deterministically (most-specific / lowest id) or fail closed.action_id(s);
custom v0: nothing fires.policy = override_row ?? catalog.default_state ?? app.default_policy.
Built-in off-catalog requests and all custom requests land on
default_policy.DENY > ASK > ALWAYS
(order-independent; admins never manage rule ordering).The providers refactor
already established the contract: every provider declares a spec: ProviderSpec
(ExternalAppProvider), and the OAuth subset adds the flow
(OAuthExternalAppProvider / OAuthProviderSpec). The catalog + matcher attach
to the base ProviderSpec / ExternalAppProvider, because policy applies to
every provider regardless of how it authenticates. The admin UI is already
descriptor-driven via BuiltInExternalAppDescriptor (_descriptor_for in
providers/__init__.py), so catalog fields flow to the frontend with no
per-provider FE work.
external_app_policy (new table) — the decision, sparse (overrides only)| Column | Notes |
|---|---|
id | surrogate PK |
external_app_id | FK → external_app(id) ON DELETE CASCADE |
action_id | text — hierarchical service.resource.verb (built-in: a catalog id; reserved for custom per-endpoint ids later) |
policy | text NOT NULL CHECK (policy IN ('ALWAYS','ASK','DENY')) — Onyx convention: string + CHECK, not a PG enum |
name | text NULL — NULL for built-in (display comes from the code catalog at read time); set for custom |
description | text NULL — same |
match | jsonb NULL — reserved: NULL in v0 (built-in matchers live in code; custom is blanket-only). Populated when per-endpoint custom rules land. |
created_at, updated_at | timestamptz |
| UNIQUE | (external_app_id, action_id) |
Sparse on purpose: only admin overrides are stored. Unset built-in actions
resolve to the catalog default_state at read time, so new catalog entries
auto-apply their default to existing apps with no backfill.
external_app.default_policy (new column) — fallback + custom blanket policytext NOT NULL CHECK (policy IN ('ALWAYS','ASK','DENY')).DENY — off-catalog requests fail closed (we own the
catalog, so off-script is suspicious).ALWAYS or ASK — applies to every request,
because no matcher fires for a custom app in v0.One column does double duty: the built-in off-catalog fallback and the entire custom-app policy. Both kinds share one resolution path.
Each provider declares endpoint_catalog: list[EndpointSpec] on its
ProviderSpec. An EndpointSpec binds id ↔ display ↔ recognition ↔ default:
id (e.g. slack.channel.read), normalised_name, description,
risk (read | write | delete), default_state (ALWAYS | ASK | DENY),
aliases: list[str] (rename forward-compat), and matches: list[MatchRule].MatchRule is a small closed union (same shape in code and, later, in the custom
match jsonb):
RestRoute — method, path_regex, optional resource type + capture.GraphQLOp — operation_type, root field, optional resource type.Recommended defaults: reads → ALWAYS, writes → ASK, destructive
(delete_event, chat.delete, issueArchive, …) → DENY. All admin-overridable.
Matcher strategy per current provider: Slack → /api/<method>; Google Calendar
→ HTTP method + path regex; Linear → GraphQL root field / operationName.
upstream_url_patterns already
answer "is this request this app?"; default_policy answers "what to do." The
blanket policy is default_policy — zero policy rows needed, and the admin UI
is a single dropdown.external_app_policy rows with an
inline match jsonb (the same MatchRule union). They produce action_ids
that override default_policy — identical resolution path, no rework.The proxy runtime (in-process Python vs separate service) and delivery model (resolve-per-request vs pull-and-cache) are undecided, so the contract is defined as pure functions and wrapping either behind an authenticated internal endpoint is a deferred no-op:
resolve_decision(app, normalized_request) -> Decision
(resolution steps 3–5).get_egress_ruleset(db) -> [per enabled app: {app_id, app_type, upstream_url_patterns, default_policy, actions: [{action_id, policy, match_spec}]}]. Built-in matchers are serialized to the same shape as a
custom match, so the proxy's view is uniform regardless of where a rule was
authored.The proxy needs both inputs joined by action_id: the matchers (→ produce
the id) and the policy rows + default_policy (→ the decision). Loading the
matchers alone is insufficient.
call escape hatch means
recognition keys off the normalized request, never a wrapper subcommand. The
catalog must be exhaustive enough that off-script calls fall to
default_policy.action_ids on the upsert write path;
silently drop ids no longer in the catalog on read.name/description NULL for built-in rows; never copy
catalog text into the DB.ALWAYS | ASK | DENY is the contract; a fourth state is a
deliberate schema change.providers/base.py: add
EndpointPolicy and Risk enums, the frozen EndpointSpec, the MatchRule
union, endpoint_catalog on ProviderSpec (default []), and
extract_actions(normalized_request) -> list[Action] on ExternalAppProvider
(base default returns the generic fallback).backend/onyx/external_apps/ (port the prototype;
prefer graphql-core's parser over the brace-walker for production).EndpointSpecs each in the three
provider files, with MatchRules and recommended default_states.external_app_policy table +
external_app.default_policy column; ExternalAppPolicy model +
default_policy field in db/models.py (relationship
cascade="all, delete-orphan"); seed default_policy in create_external_app
(built-in DENY; custom from the request).db/external_app.py). get_policies / replace_policies
(full-replace in one commit) / set_default_policy, and the resolver
functions resolve_decision / get_egress_ruleset.BuiltInExternalAppDescriptor with
actions: [{action_id, normalised_name, description, risk, default_state}]
(CUSTOM/empty → []); extend UpsertExternalAppRequest with
action_policies + default_policy; extend ExternalAppAdminResponse with
the merged view + default_policy. On upsert: validate keys against the
catalog (canonicalise aliases), reject unknowns, then replace_policies.ConfigureProviderModal.tsx). Built-in → descriptor.actions
grouped by resource, a 3-state control per action initialised from the merged
state, risk presets ("all reads → Allow"), and a "default for unrecognized
requests" control bound to default_policy. Custom → a single default_policy
dropdown. Opal components per web/AGENTS.md; admin-only.DENY (we own the catalog). The Approvals proposal leans ASK for genuinely
unknown services; reconciled here by reserving the graded-ASK default for the
custom tier.approvals.json too? Plan ships the
single-dropdown blanket policy; a bundle-uploaded approvals.json for
per-endpoint custom rules is deferred to the forward path.Primary: one focused external-dependency-unit file (DB + API), plus unit tests for the pure normalizer/matchers/resolver.
action_policies + default_policy persist and return
in the merged view.action_id rejected; alias canonicalised; invalid
policy value rejected.default_state for unset
actions; orphan id (in DB, gone from catalog) silently dropped./admin/apps/built-in/options includes a stable catalog per
built-in; CUSTOM → actions: [].call-style off-catalog request and a GraphQL batch)
extract the expected action_ids and resolve to the expected decision, honouring
override > catalog default > default_policy and DENY > ASK > ALWAYS; a custom
app with only default_policy returns the blanket decision for any request.body_type detection;
GraphQL operation/field extraction incl. batch and unparseable-fails-loud.No integration / Playwright until the proxy enforcement workstream lands — there is no end-to-end egress behaviour to assert yet. The schema + resolver is a data contract; per the repo's "don't overtest" guidance, the focused DB-test file plus pure unit tests cover it.
match jsonb column.DENY-driven bundle filtering so denied actions are omitted from the
delivered SKILL.md/wrapper (defense in depth atop the authoritative proxy).