docs/tools/web-fetch.md
The web_fetch tool does a plain HTTP GET and extracts readable content
(HTML to markdown or text). It does not execute JavaScript.
For JS-heavy sites or login-protected pages, use the Web Browser instead.
web_fetch is enabled by default -- no configuration needed. The agent can
call it immediately:
await web_fetch({ url: "https://example.com/article" });
{
tools: {
web: {
fetch: {
enabled: true, // default: true
provider: "firecrawl", // optional; omit for auto-detect
maxChars: 50000, // max output chars
maxCharsCap: 50000, // hard cap for maxChars param
maxResponseBytes: 2000000, // max download size before truncation
timeoutSeconds: 30,
cacheTtlMinutes: 15,
maxRedirects: 3,
useTrustedEnvProxy: false, // let a trusted HTTP(S) env proxy resolve DNS
readability: true, // use Readability extraction
userAgent: "Mozilla/5.0 ...", // override User-Agent
ssrfPolicy: {
allowRfc2544BenchmarkRange: true, // opt-in for trusted fake-IP proxies using 198.18.0.0/15
allowIpv6UniqueLocalRange: true, // opt-in for trusted fake-IP proxies using fc00::/7
},
},
},
},
}
If Readability extraction fails, web_fetch can fall back to
Firecrawl for bot-circumvention and better extraction:
{
tools: {
web: {
fetch: {
provider: "firecrawl", // optional; omit for auto-detect from available credentials
},
},
},
plugins: {
entries: {
firecrawl: {
enabled: true,
config: {
webFetch: {
apiKey: "fc-...", // optional if FIRECRAWL_API_KEY is set
baseUrl: "https://api.firecrawl.dev",
onlyMainContent: true,
maxAgeMs: 86400000, // cache duration (1 day)
timeoutSeconds: 60,
},
},
},
},
},
}
plugins.entries.firecrawl.config.webFetch.apiKey supports SecretRef objects.
Legacy tools.web.fetch.firecrawl.* config is auto-migrated by openclaw doctor --fix.
Current runtime behavior:
tools.web.fetch.provider selects the fetch fallback provider explicitly.provider is omitted, OpenClaw auto-detects the first ready web-fetch
provider from available credentials. Non-sandboxed web_fetch can use
installed plugins that declare contracts.webFetchProviders and register a
matching provider at runtime. Today the bundled provider is Firecrawl.web_fetch calls stay limited to bundled providers.web_fetch skips straight to the selected
provider fallback. If no provider is available, it fails closed.If your deployment requires web_fetch to go through a trusted outbound
HTTP(S) proxy, set tools.web.fetch.useTrustedEnvProxy: true.
In this mode, OpenClaw still applies hostname-based SSRF checks before sending the request, but it lets the proxy resolve DNS instead of doing local DNS pinning. Enable this only when the proxy is operator-controlled and enforces outbound policy after DNS resolution.
<Note> If no HTTP(S) proxy env var is configured, or the target host is excluded by `NO_PROXY`, `web_fetch` falls back to the normal strict path with local DNS pinning. </Note>maxChars is clamped to tools.web.fetch.maxCharsCapmaxResponseBytes before parsing; oversized
responses are truncated with a warningtools.web.fetch.ssrfPolicy.allowRfc2544BenchmarkRange and
tools.web.fetch.ssrfPolicy.allowIpv6UniqueLocalRange are narrow opt-ins
for trusted fake-IP proxy stacks; leave them unset unless your proxy owns
those synthetic ranges and enforces its own destination policymaxRedirectsuseTrustedEnvProxy is an explicit opt-in and should only be enabled for
operator-controlled proxies that still enforce outbound policy after DNS
resolutionweb_fetch is best-effort -- some sites need the Web BrowserIf you use tool profiles or allowlists, add web_fetch or group:web:
{
tools: {
allow: ["web_fetch"],
// or: allow: ["group:web"] (includes web_fetch, web_search, and x_search)
},
}