Dangerous Action Guards

Generated by swarm planning session on 2026-02-14

Summary

Add automatic safety guards that detect and warn users before executing dangerous actions -- destructive SQL queries, malicious npm packages, and suspicious code patterns -- even when auto-approve is enabled. Includes a "dangerous approval override" toggle for power users who want to bypass all safety checks.

Problem Statement

Users building apps with Dyad can inadvertently (or through prompt injection) execute destructive actions. Today, Dyad's only defense is the consent banner ("Allow once / Always allow / Decline"), which users frequently bypass with auto-approve or "Always allow" settings. Once bypassed, there is zero validation:

SQL queries run as-is -- a single DROP TABLE can destroy hours of work
Package names are passed directly to shell commands with no validation (and there is an existing command injection vulnerability in executeAddDependency.ts)
File writes from the LLM are completely unscanned

The LLM is an untrusted actor. Prompt injection, hallucination, and model errors can generate destructive operations the user never intended. Auto-approve removes the last line of defense. Users trust Dyad to help them build safely.

Scope

In Scope (MVP)

Dangerous SQL detection -- Heuristic pattern matching for destructive SQL operations (DROP, TRUNCATE, DELETE without WHERE, etc.). Force an enhanced consent prompt even if auto-approve is enabled.
Malicious npm package detection -- Input sanitization (fix command injection vulnerability), registry existence check pre-install, npm audit post-install for known CVEs.
Narrow code injection scanning -- High-confidence pattern detection for reverse shells, crypto miners, credential exfiltration, and obfuscated eval payloads. Near-zero false positive tolerance.
Enhanced consent banner -- Danger variant with red/destructive styling, human-readable explanations, and two-button design (no "Always allow" for dangerous actions).
Dangerous approval override -- Settings toggle to skip all danger checks, with confirmation dialog requiring typed acknowledgment and persistent UI indicator when active.
package.json write detection -- When write_file or search_replace targets package.json, run the same package validation on newly-added dependencies.
Telemetry -- Track danger detections, categories, and user decisions (allow/decline) to tune false positive rates.

Out of Scope (Follow-up)

LLM-based SQL semantic analysis (expensive, latency, provider dependency)
Comprehensive code security scanning beyond the narrow pattern set
MCP tool danger detection (MCP tools are opaque -- we don't control their behavior)
Typosquatting detection (requires maintaining/fetching popular package lists)
Sandboxed SQL execution / dry-run mode
Build-mode proposal security risk interception (separate code path from tool consent)
Per-category danger guard enable/disable in settings

User Stories

As a user with auto-approve enabled, I want Dyad to still warn me before executing destructive SQL so that I don't accidentally lose data.
As a user building with Supabase, I want to see exactly why a SQL query was flagged as dangerous so that I can make an informed decision to proceed or decline.
As a user adding dependencies, I want Dyad to warn me if a package is known-malicious or has known vulnerabilities so that I don't introduce security issues into my app.
As a user, I want to see a clear explanation of why an action was flagged so that I can dismiss false positives confidently.
As a power user, I want to disable danger checks entirely so that I can work without interruption when I know what I'm doing.
As a user reviewing agent actions (auto-approve OFF), I want danger context in the consent banner so that I can make better-informed decisions about which actions to allow.

UX Design

User Flow

Flow 1: Dangerous SQL detected (auto-approve ON)

User has auto-approve enabled and is iterating on their app
Agent generates a SQL query (e.g., DROP TABLE users)
dangerCheck on the SQL tool detects destructive pattern
Instead of auto-executing, the system intercepts and shows a danger consent banner
Banner shows: "Auto-approve paused: this query will permanently delete the users table and all its data"
User clicks "Allow anyway" (destructive style) or "Decline" (default focus)
If approved, execution continues; if declined, the agent gets feedback that the action was blocked

Flow 2: Dangerous SQL detected (auto-approve OFF)

Agent generates a destructive SQL query
dangerCheck detects the pattern
The normal consent banner is shown but with enhanced danger styling (red border, ShieldAlert icon, explanation text)
User reviews and decides with better context than the standard consent banner provides

Flow 3: Malicious npm package detected

Agent attempts to install a package
Package name is validated (sanitization regex) -- invalid names are rejected immediately
Registry existence check confirms the package exists
If the consent banner fires (ask mode or danger-escalated), it includes package metadata
After installation, npm audit --json runs and parses results
If vulnerabilities found: critical/high severity shows red danger banner; moderate/low shows amber warning banner
User reviews advisory details and decides

Flow 4: Suspicious code detected

Agent writes code via write_file, edit_file, or search_replace
Content is scanned against the high-confidence pattern set
If a pattern matches, a danger banner appears showing the filename, flagged snippet, and a specific explanation (e.g., "This code appears to open a reverse shell connection to an external server")
User reviews and decides

Flow 5: Enabling dangerous approval override

User navigates to Settings > Safety section
Finds "Skip all danger checks" toggle (default: OFF)
Toggling ON opens a confirmation dialog: "This will skip all safety warnings for dangerous SQL, suspicious packages, and potentially malicious code. Actions will proceed without review."
Dialog requires typing "I understand" to confirm
Once enabled, a persistent shield-off icon appears in the chat header/status bar
Icon is clickable to jump back to the setting
All danger checks are bypassed; normal consent flow still applies per tool settings

Key States

Default (no danger): Invisible. Zero friction. Actions proceed normally per consent settings.
Danger detected (auto-approve ON): Red/destructive banner with ShieldAlert icon, explanation, two buttons. Auto-approve paused.
Danger detected (auto-approve OFF): Enhanced consent banner with red styling and danger explanation. Same two buttons.
Warning detected (lower severity): Amber banner with AlertTriangle icon. Moderate/low npm advisories, DELETE with WHERE clause, etc.
Checking safety (async): Brief inline indicator ("Checking packages...") only for async checks like npm registry lookup. Not shown for instant checks (SQL regex).
Override active: Persistent shield-off indicator in chat header. All danger checks bypassed.
Check failed/unavailable: Fail-open with subtle notification: "Safety check unavailable -- proceeding." User knows the guard wasn't active.

Interaction Details

Danger consent banner:

Visually distinct from standard consent banner: red/destructive color scheme, ShieldAlert icon (not Bot icon)
Includes: category label ("Dangerous SQL" / "Vulnerable Package" / "Suspicious Code"), human-readable explanation, expandable content preview
Two buttons only: "Allow anyway" (destructive variant) and "Decline" (default style)
No "Always allow" option -- you cannot permanently approve dangerous actions by category
Not dismissible via X button -- only explicit button clicks
Takes priority in consent queue (dangerous items shown first)
When auto-approve is ON, banner copy reads "Auto-approve paused: [explanation]"

Keyboard navigation:

"Decline" is default focused (Enter = safe action)
"Allow anyway" requires Tab + Enter (deliberate action)

Queue behavior:

If agent fires 5 actions with auto-approve, 4 safe ones auto-execute, 1 dangerous one pauses
Multiple dangerous actions in parallel: show sequentially with queue count

Danger explanation quality (required templates):

Pattern	Explanation Template
`DROP TABLE x`	"This query will permanently delete the `{table}` table and all its data"
`DROP DATABASE x`	"This query will permanently delete the entire `{database}` database"
`TRUNCATE x`	"This query will delete all rows from the `{table}` table"
`DELETE FROM x` (no WHERE)	"This query will delete all rows from the `{table}` table"
`ALTER TABLE x DROP COLUMN y`	"This query will permanently remove the `{column}` column from the `{table}` table"
`GRANT` / `REVOKE`	"This query modifies database permissions"
npm critical/high advisory	"Package `{name}` has a known vulnerability: {advisory_title} (severity: {severity})"
npm moderate/low advisory	"Package `{name}` has a known advisory: {advisory_title} (severity: {severity})"
Reverse shell pattern	"This code appears to open a reverse shell connection to an external server"
Crypto miner pattern	"This code contains patterns associated with cryptocurrency mining"
Credential exfiltration	"This code appears to send environment variables to an external URL"
Obfuscated eval	"This code contains an obfuscated execution pattern (base64-decoded eval)"

Accessibility

Not color-alone: danger banner differs via icon (ShieldAlert vs Bot), text label ("Potentially dangerous" vs standard), AND color
aria-live="polite" on danger banner (not "assertive" -- the agent is paused, no urgency to interrupt)
Focus moves to danger banner when it appears; returns to chat input on resolution
"Skip all danger checks" toggle associated with aria-describedby pointing to warning text
Confirmation dialog is keyboard-navigable and screen-reader announced

Technical Design

Architecture

Add a dangerCheck method to the existing ToolDefinition interface. This runs before consent and can escalate the consent level from "always" to forced-ask with danger context. The detection logic is per-tool (each tool knows its domain), while the consent escalation is centralized in buildAgentToolSet.

Tool invocation → dangerCheck() → if dangerous, force consent with dangerInfo
                                → if safe, proceed with normal consent flow

New module: src/pro/main/ipc/handlers/local_agent/danger_detection/ containing:

sql_heuristics.ts -- SQL pattern matching
npm_validation.ts -- Package name sanitization + registry/audit checks
code_scanning.ts -- High-confidence malicious code patterns
types.ts -- Shared types (DangerCheckResult)

Components Affected

Component	File(s)	Change Type
Tool definition types	`tools/types.ts`	Add `dangerCheck` to `ToolDefinition` interface
Tool set builder	`tool_definitions.ts`	Wire `dangerCheck` into execute wrapper, pass `dangerInfo` to consent request
SQL tool	`tools/execute_sql.ts`	Add SQL danger heuristics via `dangerCheck`
Dependency tool	`tools/add_dependency.ts`	Add package validation via `dangerCheck`
Dependency processor	`executeAddDependency.ts`	Fix command injection: use `execFile` with array args; add post-install `npm audit`
File write tools	`tools/write_file.ts`, `tools/edit_file.ts`, `tools/search_replace.ts`	Add code scanning via `dangerCheck`; add package.json filename detection
Settings schema	`src/lib/schemas.ts`	Add `dangerousApprovalOverride` field
Settings UI	New "Safety" section in settings	Toggle with confirmation dialog
Consent banner	`AgentConsentBanner.tsx`	Danger variant (red styling, two buttons, explanation, priority queue)
Consent types	IPC payload types	Add `dangerInfo` to consent request
Chat UI	Chat header/status area	Persistent shield-off indicator when override is active
Telemetry	Agent handler	Emit danger detection events

Data Model Changes

UserSettings additions (in schemas.ts):

typescript

dangerousApprovalOverride: z.boolean().optional(), // default: false

New types:

typescript

interface DangerCheckResult {
  level: "warning" | "danger";
  category: "destructive_sql" | "malicious_package" | "suspicious_code";
  message: string; // Human-readable explanation (required, specific)
  details?: string; // Extended details (full query, advisory URL, etc.)
}

Extended consent request payload:

typescript

// In agent-tool:consent-request IPC event
{
  requestId: string;
  chatId: number;
  toolName: string;
  toolDescription: string;
  inputPreview: string;
  dangerInfo: DangerCheckResult | null; // NEW
}

Extended ToolDefinition interface:

typescript

interface ToolDefinition<T> {
  // ... existing fields ...
  dangerCheck?: (
    args: T,
    ctx: AgentContext,
  ) => Promise<DangerCheckResult | null>;
}

API Changes

Modified buildAgentToolSet execute wrapper: Before calling requireConsent, run dangerCheck. If result is non-null and dangerousApprovalOverride is not enabled, force consent to "ask" and include dangerInfo in the consent request payload.
Modified consent request IPC: Add dangerInfo field to agent-tool:consent-request event.
Modified consent response: When dangerInfo is present, only accept "accept-once" or "decline" (no "accept-always").
New telemetry events: danger_check:detected and danger_check:override with category, tool name, and user decision.

SQL Danger Heuristics

Patterns to detect (case-insensitive, ignoring SQL comments):

Pattern	Level	Template
`DROP TABLE`	danger	"permanently delete the `{table}` table"
`DROP DATABASE`	danger	"permanently delete the entire `{database}` database"
`TRUNCATE TABLE`	danger	"delete all rows from the `{table}` table"
`DELETE FROM` without `WHERE`	danger	"delete all rows from the `{table}` table"
`ALTER TABLE ... DROP COLUMN`	warning	"permanently remove the `{column}` column"
`GRANT` / `REVOKE`	warning	"modifies database permissions"
`DROP SCHEMA` / `DROP INDEX`	warning	"permanently delete database object"

Implementation notes:

Strip SQL comments (--, /* */) before pattern matching to prevent bypass
Handle multi-statement queries (split on ; and check each)
Sub-millisecond execution (regex only, no parsing)

npm Package Validation

Pre-install (in dangerCheck):

Validate package name against npm naming rules: ^(@[a-z0-9-~][a-z0-9-._~]*/)?[a-z0-9-~][a-z0-9-._~]*(@.*)?$
Reject any name that doesn't match (prevents command injection AND invalid packages)
Fetch https://registry.npmjs.org/{package} to confirm existence and check deprecated flag

Post-install (in executeAddDependency):

Run npm audit --json or pnpm audit --json in the app directory
Parse output for new vulnerabilities
If critical/high: show danger banner with advisory details
If moderate/low: show warning banner
Cache advisory data locally with 24-hour TTL for repeated installs

Command injection fix (immediate, independent):

Replace exec(\pnpm add ${packageStr}`)withexecFile("pnpm", ["add", ...packages])` or equivalent
Validate all package name strings before any shell interaction

Code Injection Patterns

High-confidence, near-zero false positive patterns:

typescript

const DANGER_PATTERNS = [
  // Reverse shells
  {
    pattern: /\b(nc|ncat|netcat)\s+-[a-z]*e\s/i,
    message: "reverse shell connection",
  },
  { pattern: /\/dev\/tcp\//, message: "reverse shell connection" },
  {
    pattern: /child_process.*?(exec|spawn).*?(bash|sh|cmd|powershell)/s,
    message: "shell execution",
  },

  // Crypto miners
  {
    pattern: /\b(coinhive|cryptonight|stratum\+tcp|xmrig)\b/i,
    message: "cryptocurrency mining",
  },

  // Credential exfiltration
  {
    pattern: /process\.env\b.*?\bfetch\s*\(/s,
    message: "environment variable exfiltration",
  },
  {
    pattern: /process\.env\b.*?\bhttp/s,
    message: "environment variable exfiltration",
  },

  // Obfuscated payloads
  { pattern: /\batob\s*\(.*?\beval\b/s, message: "obfuscated code execution" },
  {
    pattern: /Buffer\.from\s*\([^)]+,\s*['"]base64['"]\).*?\beval\b/s,
    message: "obfuscated code execution",
  },
];

Applied to content in write_file, edit_file (edit sketch content), and search_replace (replacement content). Not applied to the full file to avoid false positives from existing code.

Implementation Plan

Phase 0: Security Fix (Independent, Ship Immediately)

Fix command injection in executeAddDependency.ts -- replace string interpolation with execFile array args or validate package names with regex before shell execution
Add unit tests for package name validation

Phase 1: Foundation

Add dangerCheck field to ToolDefinition interface in tools/types.ts
Add DangerCheckResult type to danger_detection/types.ts
Wire dangerCheck into buildAgentToolSet execute wrapper -- run before consent, force "ask" if dangerous
Extend consent request IPC payload with dangerInfo: DangerCheckResult | null
Update AgentConsentBanner.tsx with danger variant: red styling, ShieldAlert icon, explanation text, two-button layout (no "Always allow"), priority queue ordering, not X-dismissible
Add aria-live="polite", focus management, keyboard defaults (Decline focused)
Add danger detection telemetry: danger_check:detected, danger_check:user_decision

Phase 2: SQL Danger Heuristics

Implement sql_heuristics.ts with pattern matching for destructive operations
Add dangerCheck to executeSqlTool that calls SQL heuristics
Handle SQL comment stripping, multi-statement queries
Add human-readable explanation templates with table/column name extraction
Unit tests: corpus of dangerous and safe SQL, edge cases (DROP in comments, DELETE with complex WHERE, multi-statement)

Phase 3: npm Package Validation

Implement npm_validation.ts with package name sanitization regex
Add pre-install registry existence check (https://registry.npmjs.org/{package})
Add dangerCheck to addDependencyTool for pre-install validation
Add post-install npm audit --json / pnpm audit --json parsing in executeAddDependency.ts
Map npm advisory severity to danger levels (critical/high = danger, moderate/low = warning)
Add local caching for advisory data (24-hour TTL)
Handle @version suffix in package names
Unit tests: valid/invalid names, known vulnerable packages (mocked registry), severity mapping

Phase 4: Code Injection Scanning

Implement code_scanning.ts with high-confidence pattern set
Add shared scanContentForDangers(content: string) function
Add dangerCheck to writeFileTool, editFileTool, searchReplaceTool
For edit_file: scan the edit sketch content, not the final merged file
Add package.json detection: if target file is package.json, parse diff and run npm validation on new dependencies
Per-pattern explanation templates
Unit tests: known malicious patterns, legitimate code that looks suspicious (build tools, base64 in tests)
Performance benchmark: verify sub-millisecond execution for regex patterns

Phase 5: Dangerous Approval Override

Add dangerousApprovalOverride: boolean to BaseUserSettingsFields in schemas.ts (default: false)
Wire override check into buildAgentToolSet -- skip dangerCheck when enabled
Add "Safety" section in settings UI, visually separated from auto-approve
Implement confirmation dialog with "I understand" text input requirement
Add persistent shield-off indicator in chat header when override is active (clickable to jump to setting)
Add telemetry for override enable/disable events
Consider auto-expiry on app update (re-prompt user to re-enable)

Testing Strategy

Unit tests for SQL heuristics: Corpus of 50+ dangerous and safe SQL queries. Edge cases: DROP inside comments, DELETE with complex WHERE clauses, multi-statement queries, case variations, GRANT/REVOKE.
Unit tests for npm validation: Valid package names, invalid/malicious names, @scope/package format, package@version format, names with special characters (command injection attempts).
Unit tests for code scanning: Known malicious patterns, legitimate code that resembles patterns (build tools using eval, base64 in unit tests, process.env in config files).
Integration tests: Verify that dangerCheck results flow through the consent system correctly -- forced consent shows danger banner even when consent is "always", danger info appears in banner, "accept-always" is not an option.
E2E tests: Simulate agent attempting dangerous SQL with auto-approve ON; verify danger banner appears with correct explanation. Test override toggle flow.
Regression tests: Ensure existing auto-approve workflows are not broken for non-dangerous operations. Verify zero-friction happy path.
Performance tests: Benchmark SQL heuristics and code scanning to verify sub-millisecond execution on typical inputs.

Risks & Mitigations

Risk	Likelihood	Impact	Mitigation
False positives erode user trust	HIGH	HIGH	Start with very high-confidence patterns only. Track override rates via telemetry. Remove patterns that produce false positives.
Command injection via package names (EXISTING)	HIGH	HIGH	Fix immediately in Phase 0, independent of feature work. Use `execFile` with array args.
Override + auto-approve = zero guardrails	MEDIUM	HIGH	Track this state in telemetry. Consider auto-expiry on app update. Persistent UI indicator.
Narrow code scanning creates false sense of security	MEDIUM	MEDIUM	Honest messaging: "checks for common malicious patterns" not "security scanning." Document known limitations.
npm audit coverage gaps (no typosquats, zero-days)	MEDIUM	MEDIUM	Accept as known limitation. Document. Consider Socket.dev integration in v2.
Performance impact on file writes from code scanning	LOW	MEDIUM	Regex-only patterns (sub-millisecond). Benchmark before shipping.
Bypass via indirect paths (write benign script that downloads malware)	MEDIUM	LOW	Fundamental limitation of static analysis. Accept and document.
npm registry/audit API unavailable (offline/outage)	LOW	LOW	Fail-open with notification: "Safety check unavailable -- proceeding."
Pattern list goes stale as threats evolve	LOW	MEDIUM	Keep pattern set small and high-signal. Easy to update (single file).
MCP tools bypass all danger checks	LOW	LOW	Document as known limitation. Out of scope for v1.

Open Questions

Build mode coverage: The autoApproveChanges setting in build mode bypasses the proposal flow, including existing SecurityRisk warnings. This feature only covers local-agent mode. Should build mode be covered in v2?
npm: protocol aliases: package.json edits could use "my-pkg": "npm:[email protected]" to bypass name validation. Should we parse these in the package.json detection?
Per-category danger guard settings: Should users be able to disable SQL checks but keep npm checks? The category field on DangerCheckResult enables this in the future, but it's not in the MVP.
MCP tool danger detection: MCP tools are opaque but could execute SQL or install packages. Future option: let MCP server authors declare danger levels in tool metadata.

Decision Log

Decision	Reasoning
Heuristic SQL detection over LLM-based	LLM adds latency, cost, and provider dependency (violates Backend-Flexible principle). Heuristics catch 95%+ of destructive patterns with zero false positives on the obvious cases.
npm audit advisories over Socket.dev	Free, official, no API key needed. Socket.dev is more comprehensive but adds external dependency. Can upgrade later.
Include narrow code injection scanning in MVP	User decided. Scoped to near-zero false positive patterns (reverse shells, crypto miners, credential exfiltration). Performance impact is minimal (regex-only).
Include dangerous approval override in MVP	User decided. Mitigated with confirmation dialog (typed "I understand"), persistent UI indicator, and telemetry tracking.
Always show danger context (even with auto-approve OFF)	Enhances decision quality for all users. Same consent banner component, just with upgraded styling when danger is detected.
Advisory (forced consent) over blocking	Users can still proceed past warnings. This respects user autonomy while ensuring informed consent. The override toggle is the escape hatch from even this.
Two buttons only on danger banner (no "Always allow")	Permanently auto-approving dangerous actions defeats the purpose. Users approve per-instance or use the global override.
`dangerCheck` per-tool over centralized detection	Each tool knows its domain best. SQL heuristics are completely different from npm validation. Co-locating detection with the tool is cleaner and more extensible.
Fix command injection independently	This is a security bug that exists today, not a feature. Ship the fix immediately without waiting for the full danger guards feature.
Fail-open when checks are unavailable	Fail-closed would mean a third-party API outage blocks the user's work. Fail-open with notification is the right balance for a local-first tool.
`aria-live="polite"` over "assertive"	The agent is paused waiting for consent -- there's no urgency. "Assertive" would disruptively interrupt screen reader users.

Generated by dyad:swarm-to-plan