docs/features/output-sanitisation.md
mcpproxy contains untrusted tool output before it reaches your agent. It builds
on the existing content-trust classification (Spec 035, derived from each tool's
openWorldHint) and the secret detector (Spec 026), enforcing them at the single
response chokepoint instead of only logging.
| Behaviour | Default | Trigger |
|---|---|---|
| Spotlight untrusted text in source-identifying delimiters (lossless) | off | spotlight_untrusted: true |
Redact detected secrets → [REDACTED:<category>] | off | response_action: redact |
| Strip control sequences (ANSI / C0-C1 / zero-width / bidi) on untrusted text | off | strip_control_chars: true |
| Block the whole response on a critical detection | off | response_action: block |
The feature is fully opt-in: with no output_sanitisation block (or all
options at their defaults) mcpproxy forwards every tool response byte-for-byte —
identical to pre-feature behaviour. Each behaviour above is enabled
independently. Non-text blocks (image / audio / embedded resource) are never
modified.
Spotlighting wraps untrusted text as:
«untrusted:server/tool»
<tool output, with « and » escaped so it cannot forge the delimiter>
«/untrusted:server/tool»
{
"output_sanitisation": {
"spotlight_untrusted": true, // opt into spotlighting untrusted output
"response_action": "spotlight", // "spotlight" | "redact" | "block"
"strip_control_chars": false,
"strip_classes": ["ansi", "c0c1", "bidi", "zero_width"],
"max_redactions": 100
}
}
Omit the block entirely (or leave spotlight_untrusted: false) and mcpproxy
does nothing — fully backward-compatible. Set spotlight_untrusted: true for the
lossless wrapper, response_action: redact to mask secrets, or
response_action: block to drop critical responses.
Redact / strip / block actions emit a policy_decision activity record with a
descriptive reason (visible via mcpproxy activity list and the Web UI Activity
Log). Pure spotlighting is logged at debug level only. The activity log stores
the sanitised response, so redacted secrets are not persisted in the audit
trail.
Redaction, stripping, and blocking run before truncation and the read_cache
write, so a large response paginated through read_cache carries the same
sanitisation as the agent-facing copy (a blocked response is never cached at
all). The lossless spotlight wrapper is applied after truncation and is not
cached (the cache holds raw, paginable records).
See also: sensitive-data-detection.md (the detector reused for redaction/block) and Spec 056 (output-schema validation, Track A) which shares the same chokepoint.