Back to Dyad

Web Fetch Tool for Local Agent Mode

plans/web-fetch-local-agent.md

0.44.017.6 KB
Original Source

Web Fetch Tool for Local Agent Mode

Generated by swarm planning session on 2026-02-25

Summary

Add a new web_fetch tool to the local agent that fetches and reads website content when users share URLs for reference. Unlike the existing Pro-only web_crawl tool (which uses Firecrawl for visual cloning with screenshots), web_fetch performs a direct local HTTP fetch from the user's machine, making it available to all users (free + Pro) at zero infrastructure cost.

Problem Statement

When users paste a URL into the Dyad chat (e.g., "Help me integrate this API: https://docs.stripe.com/api"), the agent cannot access the content behind that URL. Users must manually copy-paste page content, breaking their flow. This is especially painful for developers building with APIs, following tutorials, or referencing documentation — the most common use cases for Dyad's target audience. The existing web_crawl tool only activates for "clone/copy/replicate" intent and requires Dyad Pro, leaving a gap for the broader "read this page for context" use case.

Scope

In Scope (MVP)

  • New web_fetch tool that fetches a URL and returns content as markdown
  • Available to all users (free + Pro) — no isDyadPro gate
  • LLM-triggered via standard tool call mechanism (not auto-detected)
  • HTML-to-markdown conversion using turndown + @mozilla/readability for content extraction
  • Content-Type detection: HTML → markdown, JSON → code block, text → as-is, PDF/images → "not supported" message
  • URL scheme validation (http: and https: only; block file:, ftp:, data:, javascript:, blob: schemes)
  • Private/localhost IPs allowed (consent dialog is sufficient protection)
  • Consent-gated with "ask" default
  • Content truncation at 16,000 characters (matching existing MAX_TEXT_SNIPPET_LENGTH)
  • Timeout at 10-15 seconds via AbortController
  • XML streaming preview via <dyad-web-fetch> tag
  • Clear error messages for timeout, 403/blocked, empty content, unsupported content types

Out of Scope (Follow-up)

  • Auto-detection of URLs in user input (pre-fetching before LLM runs)
  • JavaScript rendering / headless browser for SPAs
  • Screenshot capture
  • PDF content extraction
  • Caching of fetched pages within a session
  • Batch consent UI for multiple URLs in one message
  • Re-fetch / refresh button on completed cards
  • Link preview in chat input area

User Stories

  • As a developer building an app, I want to paste an API documentation URL and have the agent understand its contents, so that I can say "integrate this API" without manually copying docs.
  • As a user following a tutorial, I want to share a blog post or tutorial URL with the agent, so that it can follow the instructions and implement what the tutorial describes.
  • As a user referencing a design, I want to share a website URL for style reference (without cloning), so that the agent understands the content and direction I'm going for.
  • As a free-tier user, I want basic web fetching to work without a Pro subscription, so that I can reference external content in my workflow.

UX Design

User Flow

  1. User types a message that includes a URL (e.g., "Use the Stripe API docs at https://docs.stripe.com/api to add payments")
  2. The LLM recognizes the URL and determines it needs the page content to fulfill the request
  3. A consent dialog appears: Fetch page content: "https://docs.stripe.com/api"
  4. User approves (accept-once / accept-always / decline)
  5. A <dyad-web-fetch> card appears in the chat showing the URL being fetched with a loading state
  6. Content is fetched, processed through Readability + Turndown, truncated if needed, and returned as the tool result
  7. The card transitions to a completed state showing the page title (extracted by Readability) and URL
  8. The AI continues its response using the fetched content as context

Key States

  • Loading: Card with URL, spinner, "Fetching..." label (use existing DyadStateIndicator pattern)
  • Completed (HTML): Card with page title (extracted by Readability) + URL in muted text, expandable to show markdown preview
  • Completed (JSON): Card with application/json badge + URL, expandable content as code block
  • Completed (text): Card with text/plain badge + URL, content displayed as-is
  • Error — Timeout: "This page couldn't be reached. Check the URL and try again."
  • Error — Blocked (403): "This page blocked the request. You may need to copy-paste its content manually."
  • Error — Empty/JS-only: "This page returned no readable content. It may require JavaScript to render."
  • Warning — Unsupported type: Amber/warning state (not red error): "PDF files cannot be fetched as text. Try copying the relevant content and pasting it into the chat." (Use <dyad-output type="warning">)
  • Truncated: Show note on card: "Content truncated (showing first 16,000 characters)"

Interaction Details

  • Consent preview text: Fetch page content: "https://..." (action-focused, not implementation-detail-focused)
  • Card icon: Use Link from lucide-react (differentiated from Globe for web_search and ScanQrCode for web_crawl)
  • Badge color: Use purple to differentiate from the blue used by web_search and web_crawl
  • Completed card is collapsed by default with page title visible; expandable to show markdown preview
  • When truncation occurs, surface it in the card UI so users understand the AI only saw partial content

Accessibility

  • Consent dialog: keyboard-navigable via standard button focus (existing pattern)
  • Expandable cards: Enter/Space to toggle (existing DyadCard pattern)
  • Screen reader: announce "Web Fetch completed: [page title]" or "Web Fetch failed: [error]"

Technical Design

Architecture

New tool following the established ToolDefinition<T> pattern. Performs a direct HTTP fetch from the Electron main process using Node.js fetch(), processes HTML through @mozilla/readability for content extraction, then converts to markdown via turndown. Returns the markdown string as the tool result. No changes to existing tools or the agent handler.

Dependency pipeline: fetch(url)linkedom.parseHTML(html)new Readability(doc).parse()new TurndownService().turndown(article.content)truncateText(markdown)

linkedom is required because both @mozilla/readability and turndown need a DOM document, and Electron's main process doesn't have one. linkedom is lightweight (~50KB) and much faster than JSDOM.

Components Affected

  • New file: src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts — Tool implementation
  • Modified: src/pro/main/ipc/handlers/local_agent/tool_definitions.ts — Import and register webFetchTool in TOOL_DEFINITIONS array
  • Modified: package.json — Add turndown, @types/turndown, linkedom, @mozilla/readability (or defuddle)
  • New file (renderer): DyadWebFetch component for rendering the <dyad-web-fetch> XML tag in chat
  • No changes to: web_crawl.ts, engine_fetch.ts, local_agent_handler.ts, types.ts

Data Model Changes

None. The tool returns a string result via the existing ToolResult type. No schema or storage changes.

API Changes

No external API changes. Internally:

  • New tool web_fetch added to TOOL_DEFINITIONS array
  • New XML tag <dyad-web-fetch> for renderer

Tool Description (Critical)

The tool description guides LLM behavior and is the single biggest factor in feature success:

Fetch and read content from a URL. Works with web pages (returns cleaned markdown) and API endpoints (returns JSON).

### When to Use
Use this tool when the user shares a URL and wants you to reference, understand, or use information from that page. Examples:
- User shares API documentation and asks you to integrate it
- User shares a tutorial or blog post and wants you to follow it
- User shares a web page and asks about its content
- User shares an API endpoint URL and wants you to understand the response

### When NOT to Use
- User wants to CLONE / COPY / REPLICATE / RECREATE a website's visual design — use web_crawl instead
- User mentions a URL in passing without wanting you to read it
- You need to search the web for information (no specific URL) — use web_search instead

### Limitations
- Cannot render JavaScript — some dynamic/SPA pages may return limited content
- Content is truncated to ~16,000 characters for very long pages
- PDF and image files are not supported

Key Implementation Details

typescript
// web_fetch.ts - Core structure

const webFetchSchema = z.object({
  url: z.string().describe("URL to fetch"),
});

// URL validation: only http: and https: schemes
// No private IP blocking (user decision: allow with consent)
// Timeout: 10-15 seconds via AbortController
// User-Agent: set a reasonable browser-like string

// Content-Type handling:
// text/html → Readability extraction → Turndown markdown → truncate
// application/json → return as ```json code block → truncate
// text/plain, text/markdown → return as-is → truncate
// application/pdf, image/* → return "not supported" message
// other → attempt text extraction, fall back to "not supported"

// Truncation: reuse MAX_TEXT_SNIPPET_LENGTH (16,000 chars) pattern

export const webFetchTool: ToolDefinition<z.infer<typeof webFetchSchema>> = {
  name: "web_fetch",
  description: DESCRIPTION,
  inputSchema: webFetchSchema,
  defaultConsent: "ask",
  // No isEnabled gate — available to all users

  getConsentPreview: (args) => `Fetch page content: "${args.url}"`,

  buildXml: (args, isComplete) => {
    if (!args.url) return undefined;
    let xml = `<dyad-web-fetch url="${escapeXmlContent(args.url)}">`;
    if (isComplete) xml += "</dyad-web-fetch>";
    return xml;
  },

  execute: async (args, ctx) => {
    // 1. Validate URL scheme (http/https only)
    // 2. Fetch with timeout (AbortController, 15s)
    // 3. Check Content-Type header
    // 4. For HTML: parse with Readability, convert with Turndown
    // 5. For JSON: wrap in code block
    // 6. For text: return as-is
    // 7. For unsupported: return clear message
    // 8. Truncate to MAX_TEXT_SNIPPET_LENGTH
    // 9. Return markdown string as tool result
  },
};

Implementation Plan

Phase 1: Core Tool

  • Add dependencies: turndown, @types/turndown, linkedom, @mozilla/readability (evaluate defuddle as alternative)
  • Create src/pro/main/ipc/handlers/local_agent/tools/web_fetch.ts with:
    • URL scheme validation
    • Fetch with AbortController timeout (15 seconds)
    • Content-Type detection and routing
    • Readability extraction for HTML
    • Turndown markdown conversion
    • JSON/text/unsupported content handling
    • Truncation using existing pattern
    • Proper error messages for common failure modes
  • Register webFetchTool in tool_definitions.ts TOOL_DEFINITIONS array
  • Write tool description with clear when-to-use / when-not-to-use guidance

Phase 2: Renderer Component

  • Create DyadWebFetch component to render <dyad-web-fetch> XML tags
  • Implement loading state (URL + spinner)
  • Implement completed state (page title + URL, expandable markdown preview)
  • Implement error states
  • Show truncation indicator when content was truncated
  • Register in the markdown parser's XML tag handler

Phase 3: Testing

  • Unit tests for URL validation (scheme checking, malformed URLs)
  • Unit tests for Content-Type handling (HTML, JSON, text, PDF, images)
  • Unit tests for HTML-to-markdown conversion (simple pages, complex pages, empty bodies)
  • Unit tests for truncation behavior
  • Unit tests for timeout/error handling (mock fetch failures, non-200 responses)
  • Integration test: verify tool appears in buildAgentToolSet output (no isEnabled gate)
  • Manual E2E testing with real URLs in local agent chat

Testing Strategy

  • Unit test URL scheme validation: verify file://, ftp://, data: are rejected; http:// and https:// are accepted
  • Unit test Content-Type routing: verify HTML → readability+turndown, JSON → code block, text → as-is, PDF → error message
  • Unit test HTML conversion with various inputs: simple pages, pages with scripts/styles, empty bodies, non-UTF-8 encoding
  • Unit test truncation: verify content over 16K chars is truncated with indicator
  • Unit test error handling: mock network failures, timeouts, 403/404 responses, non-200 status codes
  • Integration test: verify webFetchTool is included in tool set for both Pro and non-Pro contexts
  • Manual test: verify consent dialog, loading card, completed card, error states in the actual UI
  • Manual test: verify tool is NOT triggered for clone/replicate intent (web_crawl should be used instead)

Risks & Mitigations

RiskLikelihoodImpactMitigation
JS-rendered SPAs return minimal contentMediumMediumClear tool description noting limitation; LLM can explain to user; Pro users can use web_crawl
LLM confuses web_fetch with web_crawl or web_searchLowMediumPrecise, mutually-exclusive tool descriptions with explicit when/when-not guidance
Large HTML pages block Electron main process during conversionLowMediumTruncate raw HTML before processing; move to worker thread in follow-up if needed
Content quality varies across sites (paywalls, anti-bot)MediumLowReturn clear error messages; user can fall back to manual copy-paste
New dependencies (turndown, readability) introduce maintenance burdenLowLowBoth are mature, stable libraries with large install bases
"Accept always" consent enables unbounded fetch loopsLowMediumMonitor; consider per-turn fetch limit in follow-up if abuse is observed

Open Questions

  • Readability vs. Defuddle: Evaluate defuddle (by Jina AI) as a potential alternative to @mozilla/readability. Defuddle may offer better extraction for modern web pages. Decision can be made during implementation based on testing.
  • DOM library: linkedom is included as the DOM implementation since both @mozilla/readability and turndown require a DOM document and Electron's main process doesn't provide one. linkedom is lightweight (~50KB) and much faster than JSDOM.
  • Multiple URLs per message: When a user pastes 2-5 URLs, the LLM may call web_fetch multiple times. Each triggers a separate consent dialog. If this proves disruptive, consider batch consent UI in a follow-up.
  • Stale content: Fetched content is point-in-time. For long conversations, consider adding timestamps to fetch cards and a re-fetch capability in a follow-up.

Decision Log

DecisionReasoning
New tool (web_fetch) rather than extending web_crawlUse cases are fundamentally different (read vs. clone). Separate tools = cleaner code, clearer LLM descriptions, independent consent settings. All 3 roles agreed independently.
Available to all users (free + Pro)Local fetch has zero infrastructure cost. Differentiates free tier. Natural upsell to Pro for enhanced crawl+screenshot.
LLM-triggered, not auto-detectedConsistent with existing tool architecture. Auto-detection would require new handler-layer logic and might fetch URLs users didn't intend.
Allow private/localhost IPsDyad runs locally; SSRF is a server-side threat model. Fetching localhost:3000 or internal docs is a legitimate use case. Consent dialog provides sufficient protection.
Include @mozilla/readability in v1Dramatically better content extraction (strips nav, footer, ads). Small marginal cost (one extra dependency). All roles agreed.
Handle Content-Type gracefully~15 lines of code prevents confusing failures for JSON, text, PDF URLs. Better UX for minimal effort.
Consent default: "ask"Consistent with web_crawl and web_search. Network requests to arbitrary external URLs warrant explicit approval.
Truncation at 16K charactersMatches existing MAX_TEXT_SNIPPET_LENGTH. Prevents context window overflow while providing substantial content.
Tool name: web_fetchConsistent with web_search, web_crawl naming convention. Clear, concise, action-oriented.

Generated by dyad:swarm-to-plan