docs/configuration/sensitive-data-detection.md
MCPProxy includes automatic sensitive data detection that scans MCP tool call arguments and responses for secrets, credentials, API keys, and other potentially exposed data. This feature helps identify accidental data exposure in your AI agent workflows.
{
"sensitive_data_detection": {
"enabled": true,
"scan_requests": true,
"scan_responses": true,
"max_payload_size_kb": 1024,
"entropy_threshold": 4.5,
"categories": {
"cloud_credentials": true,
"private_key": true,
"api_token": true,
"auth_token": true,
"sensitive_file": true,
"database_credential": true,
"high_entropy": true,
"credit_card": true
},
"custom_patterns": [
{
"name": "acme_api_key",
"regex": "ACME-[A-Z0-9]{32}",
"severity": "high",
"category": "api_token"
}
],
"sensitive_keywords": ["SECRET_PROJECT", "INTERNAL_KEY"]
}
}
| Option | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable or disable sensitive data detection entirely |
scan_requests | boolean | true | Scan tool call arguments for sensitive data |
scan_responses | boolean | true | Scan tool responses for sensitive data |
max_payload_size_kb | integer | 1024 | Maximum payload size to scan in kilobytes |
entropy_threshold | float | 4.5 | Shannon entropy threshold for high-entropy string detection |
MCPProxy detects sensitive data across multiple categories. Each category can be individually enabled or disabled.
| Category | Description | Severity | Examples |
|---|---|---|---|
cloud_credentials | Cloud provider credentials | Critical/High | AWS access keys, GCP API keys, Azure connection strings |
private_key | Cryptographic private keys | Critical | RSA, EC, DSA, OpenSSH, PGP private keys |
api_token | Service API tokens | Critical/High | GitHub PATs, Stripe keys, OpenAI keys, Anthropic keys |
auth_token | Authentication tokens | High/Medium | JWT tokens, Bearer tokens |
sensitive_file | Sensitive file paths | High | SSH keys, credentials files, private key files |
database_credential | Database connection strings | Critical/High | MySQL, PostgreSQL, MongoDB, Redis connection strings |
high_entropy | High-entropy strings | Medium | Random strings that may be secrets |
credit_card | Payment card numbers | Critical | Credit card numbers (Luhn-validated) |
AKIA..., ASIA... (20 characters)AIza... (39 characters)"type": "service_account"AccountKey=...-----BEGIN RSA PRIVATE KEY----------BEGIN EC PRIVATE KEY----------BEGIN DSA PRIVATE KEY----------BEGIN OPENSSH PRIVATE KEY----------BEGIN PGP PRIVATE KEY BLOCK----------BEGIN PRIVATE KEY-----ghp_..., gho_..., ghs_..., ghr_..., github_pat_...glpat-...sk_live_..., pk_live_..., sk_test_...xoxb-..., xoxp-..., xapp-...SG....sk-..., sk-proj-...sk-ant-api...eyJBearer ... authorization headersmysql://user:pass@hostpostgresql://user:pass@hostmongodb://user:pass@hostredis://:pass@hostDB_PASSWORD=...To disable specific categories:
{
"sensitive_data_detection": {
"categories": {
"cloud_credentials": true,
"private_key": true,
"api_token": true,
"auth_token": true,
"sensitive_file": true,
"database_credential": true,
"high_entropy": false,
"credit_card": true
}
}
}
Categories not specified in the configuration are enabled by default.
You can define custom detection patterns for organization-specific secrets or internal credentials.
Use regular expressions to match specific formats:
{
"sensitive_data_detection": {
"custom_patterns": [
{
"name": "acme_api_key",
"regex": "ACME-[A-Z0-9]{32}",
"severity": "high",
"category": "api_token"
},
{
"name": "internal_service_token",
"regex": "SVC_[a-zA-Z0-9]{24}_[0-9]{10}",
"severity": "critical",
"category": "auth_token"
},
{
"name": "internal_db_password",
"regex": "(?i)INTERNAL_DB_PASS=[^\\s]+",
"severity": "critical",
"category": "database_credential"
}
]
}
}
Use simple keyword matching for straightforward detection:
{
"sensitive_data_detection": {
"custom_patterns": [
{
"name": "internal_project_id",
"keywords": ["PROJ-SECRET", "INTERNAL-KEY", "CONFIDENTIAL-TOKEN"],
"severity": "medium"
},
{
"name": "legacy_api_marker",
"keywords": ["X-Legacy-Auth", "OldApiKey"],
"severity": "low",
"category": "api_token"
}
]
}
}
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier for the pattern |
regex | string | No* | Regular expression pattern |
keywords | array | No* | List of keywords to match (case-insensitive) |
severity | string | Yes | Risk level: critical, high, medium, or low |
category | string | No | Category for grouping (defaults to custom) |
*Either regex or keywords must be specified, but not both.
| Severity | Description | Use Cases |
|---|---|---|
critical | Immediate security risk | Private keys, cloud credentials, production API keys |
high | Significant security concern | API tokens, database passwords, OAuth tokens |
medium | Potential security issue | High-entropy strings, internal tokens |
low | Informational | Keywords, debug markers |
For simple keyword matching without creating full pattern definitions, use the sensitive_keywords array:
{
"sensitive_data_detection": {
"sensitive_keywords": [
"SUPER_SECRET",
"INTERNAL_API_KEY",
"CONFIDENTIAL_TOKEN",
"PRIVATE_DATA",
"DO_NOT_SHARE"
]
}
}
Keywords are matched case-insensitively. Each match is reported with:
sensitive_keywordcustomlowShannon entropy measures the randomness of a string. Higher entropy indicates more randomness, which often suggests a secret or credential.
Entropy Ranges:
| Range | Description | Examples |
|---|---|---|
| < 3.0 | Low entropy | Natural language, repeated characters |
| 3.0 - 4.0 | Medium entropy | Encoded data, UUIDs |
| 4.0 - 4.5 | High entropy | Possibly a secret |
| > 4.5 | Very high entropy | Likely a random secret |
The default threshold of 4.5 balances detection accuracy with false positives:
{
"sensitive_data_detection": {
"entropy_threshold": 4.5
}
}
Lower threshold (e.g., 4.0):
Higher threshold (e.g., 5.0):
The max_payload_size_kb setting controls the maximum size of content scanned:
{
"sensitive_data_detection": {
"max_payload_size_kb": 1024
}
}
Impact:
truncated: true in resultsHigh-Security Environments:
{
"sensitive_data_detection": {
"enabled": true,
"scan_requests": true,
"scan_responses": true,
"max_payload_size_kb": 2048,
"entropy_threshold": 4.0
}
}
Performance-Sensitive Environments:
{
"sensitive_data_detection": {
"enabled": true,
"scan_requests": true,
"scan_responses": false,
"max_payload_size_kb": 512,
"entropy_threshold": 4.8,
"categories": {
"high_entropy": false
}
}
}
Minimal Detection (Critical Only):
{
"sensitive_data_detection": {
"enabled": true,
"scan_requests": true,
"scan_responses": true,
"categories": {
"cloud_credentials": true,
"private_key": true,
"api_token": true,
"auth_token": false,
"sensitive_file": false,
"database_credential": true,
"high_entropy": false,
"credit_card": true
}
}
}
When sensitive data is detected, the result includes:
{
"detected": true,
"detections": [
{
"type": "aws_access_key",
"category": "cloud_credentials",
"severity": "critical",
"location": "arguments",
"is_likely_example": false
}
],
"scan_duration_ms": 12,
"truncated": false
}
| Field | Description |
|---|---|
detected | true if any sensitive data was found |
detections | Array of detection details |
scan_duration_ms | Time taken to scan in milliseconds |
truncated | true if payload exceeded max size and was truncated |
| Field | Description |
|---|---|
type | Pattern name that matched (e.g., aws_access_key) |
category | Detection category (e.g., cloud_credentials) |
severity | Risk level (critical, high, medium, low) |
location | Where the match was found (arguments or response) |
is_likely_example | true if the match appears to be a known test/example value |
To completely disable sensitive data detection:
{
"sensitive_data_detection": {
"enabled": false
}
}
Or to scan only requests (not responses):
{
"sensitive_data_detection": {
"scan_requests": true,
"scan_responses": false
}
}