plans/dangerous-action-guards.md
Generated by swarm planning session on 2026-02-14
Add automatic safety guards that detect and warn users before executing dangerous actions -- destructive SQL queries, malicious npm packages, and suspicious code patterns -- even when auto-approve is enabled. Includes a "dangerous approval override" toggle for power users who want to bypass all safety checks.
Users building apps with Dyad can inadvertently (or through prompt injection) execute destructive actions. Today, Dyad's only defense is the consent banner ("Allow once / Always allow / Decline"), which users frequently bypass with auto-approve or "Always allow" settings. Once bypassed, there is zero validation:
DROP TABLE can destroy hours of workexecuteAddDependency.ts)The LLM is an untrusted actor. Prompt injection, hallucination, and model errors can generate destructive operations the user never intended. Auto-approve removes the last line of defense. Users trust Dyad to help them build safely.
npm audit post-install for known CVEs.write_file or search_replace targets package.json, run the same package validation on newly-added dependencies.Flow 1: Dangerous SQL detected (auto-approve ON)
DROP TABLE users)dangerCheck on the SQL tool detects destructive patternusers table and all its data"Flow 2: Dangerous SQL detected (auto-approve OFF)
dangerCheck detects the patternFlow 3: Malicious npm package detected
npm audit --json runs and parses resultsdanger banner; moderate/low shows amber warning bannerFlow 4: Suspicious code detected
write_file, edit_file, or search_replaceFlow 5: Enabling dangerous approval override
Danger consent banner:
Keyboard navigation:
Queue behavior:
Danger explanation quality (required templates):
| Pattern | Explanation Template |
|---|---|
DROP TABLE x | "This query will permanently delete the {table} table and all its data" |
DROP DATABASE x | "This query will permanently delete the entire {database} database" |
TRUNCATE x | "This query will delete all rows from the {table} table" |
DELETE FROM x (no WHERE) | "This query will delete all rows from the {table} table" |
ALTER TABLE x DROP COLUMN y | "This query will permanently remove the {column} column from the {table} table" |
GRANT / REVOKE | "This query modifies database permissions" |
| npm critical/high advisory | "Package {name} has a known vulnerability: {advisory_title} (severity: {severity})" |
| npm moderate/low advisory | "Package {name} has a known advisory: {advisory_title} (severity: {severity})" |
| Reverse shell pattern | "This code appears to open a reverse shell connection to an external server" |
| Crypto miner pattern | "This code contains patterns associated with cryptocurrency mining" |
| Credential exfiltration | "This code appears to send environment variables to an external URL" |
| Obfuscated eval | "This code contains an obfuscated execution pattern (base64-decoded eval)" |
aria-live="polite" on danger banner (not "assertive" -- the agent is paused, no urgency to interrupt)aria-describedby pointing to warning textAdd a dangerCheck method to the existing ToolDefinition interface. This runs before consent and can escalate the consent level from "always" to forced-ask with danger context. The detection logic is per-tool (each tool knows its domain), while the consent escalation is centralized in buildAgentToolSet.
Tool invocation → dangerCheck() → if dangerous, force consent with dangerInfo
→ if safe, proceed with normal consent flow
New module: src/pro/main/ipc/handlers/local_agent/danger_detection/ containing:
sql_heuristics.ts -- SQL pattern matchingnpm_validation.ts -- Package name sanitization + registry/audit checkscode_scanning.ts -- High-confidence malicious code patternstypes.ts -- Shared types (DangerCheckResult)| Component | File(s) | Change Type |
|---|---|---|
| Tool definition types | tools/types.ts | Add dangerCheck to ToolDefinition interface |
| Tool set builder | tool_definitions.ts | Wire dangerCheck into execute wrapper, pass dangerInfo to consent request |
| SQL tool | tools/execute_sql.ts | Add SQL danger heuristics via dangerCheck |
| Dependency tool | tools/add_dependency.ts | Add package validation via dangerCheck |
| Dependency processor | executeAddDependency.ts | Fix command injection: use execFile with array args; add post-install npm audit |
| File write tools | tools/write_file.ts, tools/edit_file.ts, tools/search_replace.ts | Add code scanning via dangerCheck; add package.json filename detection |
| Settings schema | src/lib/schemas.ts | Add dangerousApprovalOverride field |
| Settings UI | New "Safety" section in settings | Toggle with confirmation dialog |
| Consent banner | AgentConsentBanner.tsx | Danger variant (red styling, two buttons, explanation, priority queue) |
| Consent types | IPC payload types | Add dangerInfo to consent request |
| Chat UI | Chat header/status area | Persistent shield-off indicator when override is active |
| Telemetry | Agent handler | Emit danger detection events |
UserSettings additions (in schemas.ts):
dangerousApprovalOverride: z.boolean().optional(), // default: false
New types:
interface DangerCheckResult {
level: "warning" | "danger";
category: "destructive_sql" | "malicious_package" | "suspicious_code";
message: string; // Human-readable explanation (required, specific)
details?: string; // Extended details (full query, advisory URL, etc.)
}
Extended consent request payload:
// In agent-tool:consent-request IPC event
{
requestId: string;
chatId: number;
toolName: string;
toolDescription: string;
inputPreview: string;
dangerInfo: DangerCheckResult | null; // NEW
}
Extended ToolDefinition interface:
interface ToolDefinition<T> {
// ... existing fields ...
dangerCheck?: (
args: T,
ctx: AgentContext,
) => Promise<DangerCheckResult | null>;
}
buildAgentToolSet execute wrapper: Before calling requireConsent, run dangerCheck. If result is non-null and dangerousApprovalOverride is not enabled, force consent to "ask" and include dangerInfo in the consent request payload.dangerInfo field to agent-tool:consent-request event.dangerInfo is present, only accept "accept-once" or "decline" (no "accept-always").danger_check:detected and danger_check:override with category, tool name, and user decision.Patterns to detect (case-insensitive, ignoring SQL comments):
| Pattern | Level | Template |
|---|---|---|
DROP TABLE | danger | "permanently delete the {table} table" |
DROP DATABASE | danger | "permanently delete the entire {database} database" |
TRUNCATE TABLE | danger | "delete all rows from the {table} table" |
DELETE FROM without WHERE | danger | "delete all rows from the {table} table" |
ALTER TABLE ... DROP COLUMN | warning | "permanently remove the {column} column" |
GRANT / REVOKE | warning | "modifies database permissions" |
DROP SCHEMA / DROP INDEX | warning | "permanently delete database object" |
Implementation notes:
--, /* */) before pattern matching to prevent bypass; and check each)Pre-install (in dangerCheck):
^(@[a-z0-9-~][a-z0-9-._~]*/)?[a-z0-9-~][a-z0-9-._~]*(@.*)?$https://registry.npmjs.org/{package} to confirm existence and check deprecated flagPost-install (in executeAddDependency):
npm audit --json or pnpm audit --json in the app directorydanger banner with advisory detailswarning bannerCommand injection fix (immediate, independent):
exec(\pnpm add ${packageStr}`)withexecFile("pnpm", ["add", ...packages])` or equivalentHigh-confidence, near-zero false positive patterns:
const DANGER_PATTERNS = [
// Reverse shells
{
pattern: /\b(nc|ncat|netcat)\s+-[a-z]*e\s/i,
message: "reverse shell connection",
},
{ pattern: /\/dev\/tcp\//, message: "reverse shell connection" },
{
pattern: /child_process.*?(exec|spawn).*?(bash|sh|cmd|powershell)/s,
message: "shell execution",
},
// Crypto miners
{
pattern: /\b(coinhive|cryptonight|stratum\+tcp|xmrig)\b/i,
message: "cryptocurrency mining",
},
// Credential exfiltration
{
pattern: /process\.env\b.*?\bfetch\s*\(/s,
message: "environment variable exfiltration",
},
{
pattern: /process\.env\b.*?\bhttp/s,
message: "environment variable exfiltration",
},
// Obfuscated payloads
{ pattern: /\batob\s*\(.*?\beval\b/s, message: "obfuscated code execution" },
{
pattern: /Buffer\.from\s*\([^)]+,\s*['"]base64['"]\).*?\beval\b/s,
message: "obfuscated code execution",
},
];
Applied to content in write_file, edit_file (edit sketch content), and search_replace (replacement content). Not applied to the full file to avoid false positives from existing code.
executeAddDependency.ts -- replace string interpolation with execFile array args or validate package names with regex before shell executiondangerCheck field to ToolDefinition interface in tools/types.tsDangerCheckResult type to danger_detection/types.tsdangerCheck into buildAgentToolSet execute wrapper -- run before consent, force "ask" if dangerousdangerInfo: DangerCheckResult | nullAgentConsentBanner.tsx with danger variant: red styling, ShieldAlert icon, explanation text, two-button layout (no "Always allow"), priority queue ordering, not X-dismissiblearia-live="polite", focus management, keyboard defaults (Decline focused)danger_check:detected, danger_check:user_decisionsql_heuristics.ts with pattern matching for destructive operationsdangerCheck to executeSqlTool that calls SQL heuristicsnpm_validation.ts with package name sanitization regexhttps://registry.npmjs.org/{package})dangerCheck to addDependencyTool for pre-install validationnpm audit --json / pnpm audit --json parsing in executeAddDependency.ts@version suffix in package namescode_scanning.ts with high-confidence pattern setscanContentForDangers(content: string) functiondangerCheck to writeFileTool, editFileTool, searchReplaceTooledit_file: scan the edit sketch content, not the final merged filepackage.json, parse diff and run npm validation on new dependenciesdangerousApprovalOverride: boolean to BaseUserSettingsFields in schemas.ts (default: false)buildAgentToolSet -- skip dangerCheck when enabled@scope/package format, package@version format, names with special characters (command injection attempts).dangerCheck results flow through the consent system correctly -- forced consent shows danger banner even when consent is "always", danger info appears in banner, "accept-always" is not an option.| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| False positives erode user trust | HIGH | HIGH | Start with very high-confidence patterns only. Track override rates via telemetry. Remove patterns that produce false positives. |
| Command injection via package names (EXISTING) | HIGH | HIGH | Fix immediately in Phase 0, independent of feature work. Use execFile with array args. |
| Override + auto-approve = zero guardrails | MEDIUM | HIGH | Track this state in telemetry. Consider auto-expiry on app update. Persistent UI indicator. |
| Narrow code scanning creates false sense of security | MEDIUM | MEDIUM | Honest messaging: "checks for common malicious patterns" not "security scanning." Document known limitations. |
| npm audit coverage gaps (no typosquats, zero-days) | MEDIUM | MEDIUM | Accept as known limitation. Document. Consider Socket.dev integration in v2. |
| Performance impact on file writes from code scanning | LOW | MEDIUM | Regex-only patterns (sub-millisecond). Benchmark before shipping. |
| Bypass via indirect paths (write benign script that downloads malware) | MEDIUM | LOW | Fundamental limitation of static analysis. Accept and document. |
| npm registry/audit API unavailable (offline/outage) | LOW | LOW | Fail-open with notification: "Safety check unavailable -- proceeding." |
| Pattern list goes stale as threats evolve | LOW | MEDIUM | Keep pattern set small and high-signal. Easy to update (single file). |
| MCP tools bypass all danger checks | LOW | LOW | Document as known limitation. Out of scope for v1. |
autoApproveChanges setting in build mode bypasses the proposal flow, including existing SecurityRisk warnings. This feature only covers local-agent mode. Should build mode be covered in v2?npm: protocol aliases: package.json edits could use "my-pkg": "npm:[email protected]" to bypass name validation. Should we parse these in the package.json detection?category field on DangerCheckResult enables this in the future, but it's not in the MVP.| Decision | Reasoning |
|---|---|
| Heuristic SQL detection over LLM-based | LLM adds latency, cost, and provider dependency (violates Backend-Flexible principle). Heuristics catch 95%+ of destructive patterns with zero false positives on the obvious cases. |
| npm audit advisories over Socket.dev | Free, official, no API key needed. Socket.dev is more comprehensive but adds external dependency. Can upgrade later. |
| Include narrow code injection scanning in MVP | User decided. Scoped to near-zero false positive patterns (reverse shells, crypto miners, credential exfiltration). Performance impact is minimal (regex-only). |
| Include dangerous approval override in MVP | User decided. Mitigated with confirmation dialog (typed "I understand"), persistent UI indicator, and telemetry tracking. |
| Always show danger context (even with auto-approve OFF) | Enhances decision quality for all users. Same consent banner component, just with upgraded styling when danger is detected. |
| Advisory (forced consent) over blocking | Users can still proceed past warnings. This respects user autonomy while ensuring informed consent. The override toggle is the escape hatch from even this. |
| Two buttons only on danger banner (no "Always allow") | Permanently auto-approving dangerous actions defeats the purpose. Users approve per-instance or use the global override. |
dangerCheck per-tool over centralized detection | Each tool knows its domain best. SQL heuristics are completely different from npm validation. Co-locating detection with the tool is cleaner and more extensible. |
| Fix command injection independently | This is a security bug that exists today, not a feature. Ship the fix immediately without waiting for the full danger guards feature. |
| Fail-open when checks are unavailable | Fail-closed would mean a third-party API outage blocks the user's work. Fail-open with notification is the right balance for a local-first tool. |
aria-live="polite" over "assertive" | The agent is paused waiting for consent -- there's no urgency. "Assertive" would disruptively interrupt screen reader users. |
Generated by dyad:swarm-to-plan