docs/gateway/troubleshooting.md
This page is the deep runbook. Start at /help/troubleshooting if you want the fast triage flow first.
Run these first, in this order:
openclaw status
openclaw gateway status
openclaw logs --follow
openclaw doctor
openclaw channels status --probe
Expected healthy signals:
openclaw gateway status shows Runtime: running, Connectivity probe: ok, and a Capability: ... line.openclaw doctor reports no blocking config/service issues.openclaw channels status --probe shows live per-account transport status and, where supported, probe/audit results such as works or audit ok.Use this when an update finishes but the Gateway is down, channels are empty, or model calls start failing with 401s.
openclaw status --all
openclaw update status --json
openclaw gateway status --deep
openclaw doctor --fix
openclaw gateway restart
Look for:
Update restart in openclaw status / openclaw status --all. Pending or
failed handoffs include the next command to run.plugin load failed: dependency tree corrupted; run openclaw doctor --fix
under Channels. That means the channel config still exists, but plugin
registration failed before the channel could load.openclaw doctor --fix checks for stale
per-agent OAuth auth shadows and removes the old copies so all agents resolve
the current shared profile.Use this when a gateway service unexpectedly stops after an update, or logs show that one openclaw binary is older than the version that last wrote openclaw.json.
OpenClaw stamps config writes with meta.lastTouchedVersion. Read-only commands can still inspect a config written by a newer OpenClaw, but process and service mutations refuse to continue from an older binary. Blocked actions include gateway service start, stop, restart, uninstall, forced service reinstall, service-mode gateway startup, and gateway --force port cleanup.
which openclaw
openclaw --version
openclaw gateway status --deep
openclaw config get meta.lastTouchedVersion
```bash
openclaw gateway install --force
openclaw gateway restart
```
Use this when logs keep printing protocol mismatch after you downgrade or roll back OpenClaw. This means an older Gateway is running, but a newer local client process is still trying to reconnect with a protocol range that the older Gateway cannot speak.
openclaw --version
which -a openclaw
openclaw gateway status --deep
openclaw doctor --deep
openclaw logs --follow
Look for:
protocol mismatch ... client=... v<version> min=<n> max=<n> expected=<n> in Gateway logs.Established clients: in openclaw gateway status --deep or Gateway clients in openclaw doctor --deep. This lists active TCP clients connected to the Gateway port, including PIDs and command lines when the OS allows it.Fix:
gateway status --deep.openclaw logs --follow shells.openclaw gateway status --deep or openclaw doctor --deep and confirm the stale client PID is gone.Do not make an older Gateway accept a newer incompatible protocol. Protocol bumps protect the wire contract; rollback recovery is a process/version cleanup problem.
Use this when logs include:
Skipping escaped skill path outside its configured root: ... reason=symlink-escape
OpenClaw treats every skill root as a containment boundary. A symlink under
~/.agents/skills, <workspace>/.agents/skills, <workspace>/skills, or
~/.openclaw/skills is skipped when its real target resolves outside that root
unless the target is explicitly trusted.
Inspect the link:
ls -l ~/.agents/skills/<name>
realpath ~/.agents/skills/<name>
openclaw config get skills.load
If the target is intentional, configure both the direct skill root and the allowed symlink target:
{
skills: {
load: {
extraDirs: ["~/Projects/manager/skills"],
allowSymlinkTargets: ["~/Projects/manager/skills"],
},
},
}
Then start a new session or wait for the skills watcher to refresh. Restart the gateway if the running process predates the config change.
Do not use broad targets such as ~, /, or a whole synced project folder.
Keep allowSymlinkTargets scoped to the real skill root that contains trusted
SKILL.md directories.
Related:
Use this when logs/errors include: HTTP 429: rate_limit_error: Extra usage is required for long context requests.
openclaw logs --follow
openclaw models status
openclaw config get agents.defaults.models
Look for:
params.context1m: true.Fix options:
<Steps> <Step title="Disable context1m"> Disable `context1m` for that model to fall back to the normal context window. </Step> <Step title="Use an eligible credential"> Use an Anthropic credential that is eligible for long-context requests, or switch to an Anthropic API key. </Step> <Step title="Configure fallback models"> Configure fallback models so runs continue when Anthropic long-context requests are rejected. </Step> </Steps>Related:
Use this when:
curl ... /v1/models works/v1/chat/completions calls workcurl http://127.0.0.1:1234/v1/models
curl http://127.0.0.1:1234/v1/chat/completions \
-H 'content-type: application/json' \
-d '{"model":"<id>","messages":[{"role":"user","content":"hi"}],"stream":false}'
openclaw infer model run --model <provider/model> --prompt "hi" --json
openclaw logs --follow
Look for:
model_not_found or 404 errors even though direct /v1/chat/completions
works with the same bare model idmessages[].content expecting a stringincomplete turn detected ... stopReason=stop payloads=0 warnings with an OpenAI-compatible local backendRelated:
If channels are up but nothing answers, check routing and policy before reconnecting anything.
openclaw status
openclaw channels status --probe
openclaw pairing list --channel <channel> [--account <id>]
openclaw config get channels
openclaw logs --follow
Look for:
requireMention, mentionPatterns).Common signatures:
drop guild message (mention required → group message ignored until mention.pairing request → sender needs approval.blocked / allowlist → sender/channel was filtered by policy.Related:
When dashboard/control UI will not connect, validate URL, auth mode, and secure context assumptions.
openclaw gateway status
openclaw status
openclaw logs --follow
openclaw doctor
openclaw gateway status --json
Look for:
Use error.details.code from the failed connect response to pick the next action:
| Detail code | Meaning | Recommended action |
|---|---|---|
AUTH_TOKEN_MISSING | Client did not send a required shared token. | Paste/set token in the client and retry. For dashboard paths: openclaw config get gateway.auth.token then paste into Control UI settings. |
AUTH_TOKEN_MISMATCH | Shared token did not match gateway auth token. | If canRetryWithDeviceToken=true, allow one trusted retry. Cached-token retries reuse stored approved scopes; explicit deviceToken / scopes callers keep requested scopes. If still failing, run the token drift recovery checklist. |
AUTH_DEVICE_TOKEN_MISMATCH | Cached per-device token is stale or revoked. | Rotate/re-approve device token using devices CLI, then reconnect. |
AUTH_SCOPE_MISMATCH | Device token is valid, but its approved role/scopes do not cover this connect request. | Re-pair the device or approve the requested scope contract; do not treat this as shared-token drift. |
PAIRING_REQUIRED | Device identity needs approval. Check error.details.reason for not-paired, scope-upgrade, role-upgrade, or metadata-upgrade, and use requestId / remediationHint when present. | Approve pending request: openclaw devices list then openclaw devices approve <requestId>. Scope/role upgrades use the same flow after you review the requested access. |
Device auth v2 migration check:
openclaw --version
openclaw doctor
openclaw gateway status
If logs show nonce/signature errors, update the connecting client and verify it:
<Steps> <Step title="Wait for connect.challenge"> Client waits for the gateway-issued `connect.challenge`. </Step> <Step title="Sign the payload"> Client signs the challenge-bound payload. </Step> <Step title="Send the device nonce"> Client sends `connect.params.device.nonce` with the same challenge nonce. </Step> </Steps>If openclaw devices rotate / revoke / remove is denied unexpectedly:
operator.adminopenclaw devices rotate --scope ... can only request operator scopes that the caller session already holdsRelated:
Use this when service is installed but process does not stay up.
openclaw gateway status
openclaw status
openclaw logs --follow
openclaw doctor
openclaw gateway status --deep # also scan system-level services
Look for:
Runtime: stopped with exit hints.Config (cli) vs Config (service)).--deep is used.Other gateway-like services detected (best effort) cleanup hints.Related:
Use this when the Gateway disappears under load, the supervisor reports an OOM-style restart, or logs mention critical memory pressure bundle written.
openclaw gateway status --deep
openclaw logs --follow
openclaw gateway stability --bundle latest
openclaw gateway diagnostics export
Look for:
Reason: diagnostic.memory.pressure.critical in the latest stability bundle.Memory pressure: with critical/rss_threshold, critical/heap_threshold, or critical/rss_growth.V8 heap: values near the heap limit.Largest session files: entries such as agents/<agent>/sessions/<session>.jsonl or sessions/<session>.jsonl.Common signatures:
critical memory pressure bundle written appears shortly before restart → OpenClaw captured a pre-OOM stability bundle. Inspect it with openclaw gateway stability --bundle latest.memory pressure: level=critical ... memoryPressureSnapshot=disabled appears in gateway logs → OpenClaw detected critical memory pressure, but the pre-OOM stability snapshot is off.Largest session files: points at a very large redacted transcript path → reduce retained session history, inspect session growth, or move old transcripts out of the active store before restarting.V8 heap: used bytes are close to the heap limit → lower prompt/session pressure, reduce concurrent work, or raise the Node heap limit only after confirming the workload is expected.Memory pressure: critical/rss_growth → memory grew quickly inside one sampling window. Check the latest logs for a large import, runaway tool output, repeated retries, or a batch of queued agent work.diagnostics.memoryPressureSnapshot: true to capture the pre-OOM stability bundle on future critical memory pressure events.The stability bundle is payload-free. It includes operational memory evidence and redacted relative file paths, not message text, webhook bodies, credentials, tokens, cookies, or raw session ids. Attach the diagnostics export to bug reports instead of copying raw logs.
Related:
Use this when Gateway startup fails with Invalid config or hot reload logs say
it skipped an invalid edit.
openclaw logs --follow
openclaw config file
openclaw config validate
openclaw doctor
Look for:
Invalid config at ...config reload skipped (invalid config): ...Config write rejected: ...openclaw.json.rejected.* file beside the active configopenclaw.json.clobbered.* file if doctor --fix repaired a broken direct edit.clobbered.* files for each config path and rotates older onesRelated:
Use this when openclaw gateway probe reaches something, but still prints a warning block.
openclaw gateway probe
openclaw gateway probe --json
openclaw gateway probe --ssh user@gateway-host
Look for:
warnings[].code and primaryTargetId in JSON output.Common signatures:
SSH tunnel failed to start; falling back to direct probes. → SSH setup failed, but the command still tried direct configured/loopback targets.multiple reachable gateways detected → more than one target answered. Usually this means an intentional multi-gateway setup or stale/duplicate listeners.Read-probe diagnostics are limited by gateway scopes (missing operator.read) → connect worked, but detail RPC is scope-limited; pair device identity or use credentials with operator.read.Gateway accepted the WebSocket connection, but follow-up read diagnostics failed → connect worked, but the full diagnostic RPC set timed out or failed. Treat this as a reachable Gateway with degraded diagnostics; compare connect.ok and connect.rpcOk in --json output.Capability: pairing-pending or gateway closed (1008): pairing required → the gateway answered, but this client still needs pairing/approval before normal operator access.gateway.auth.* / gateway.remote.* SecretRef warning text → auth material was unavailable in this command path for the failed target.Related:
If channel state is connected but message flow is dead, focus on policy, permissions, and channel specific delivery rules.
openclaw channels status --probe
openclaw pairing list --channel <channel> [--account <id>]
openclaw status --deep
openclaw logs --follow
openclaw config get channels
Look for:
pairing, allowlist, open, disabled).Common signatures:
mention required → message ignored by group mention policy.pairing / pending approval traces → sender is not approved.missing_scope, not_in_channel, Forbidden, 401/403 → channel auth/permissions issue.Related:
If cron or heartbeat did not run or did not deliver, verify scheduler state first, then delivery target.
openclaw cron status
openclaw cron list
openclaw cron runs --id <jobId> --limit 20
openclaw system heartbeat last
openclaw logs --follow
Look for:
ok, skipped, error).quiet-hours, requests-in-flight, cron-in-progress, lanes-busy, alerts-disabled, empty-heartbeat-file, no-tasks-due).Related:
If a node is paired but tools fail, isolate foreground, permission, and approval state.
openclaw nodes status
openclaw nodes describe --node <idOrNameOrIp>
openclaw approvals get --node <idOrNameOrIp>
openclaw logs --follow
openclaw status
Look for:
Common signatures:
NODE_BACKGROUND_UNAVAILABLE → node app must be in foreground.*_PERMISSION_REQUIRED / LOCATION_PERMISSION_REQUIRED → missing OS permission.SYSTEM_RUN_DENIED: approval required → exec approval pending.SYSTEM_RUN_DENIED: allowlist miss → command blocked by allowlist.Related:
Use this when browser tool actions fail even though the gateway itself is healthy.
openclaw browser status
openclaw browser start --browser-profile openclaw
openclaw browser profiles
openclaw logs --follow
openclaw doctor
Look for:
plugins.allow is set and includes browser.existing-session / user profiles.Related:
Most post-upgrade breakage is config drift or stricter defaults now being enforced.
<AccordionGroup> <Accordion title="1. Auth and URL override behavior changed"> ```bash openclaw gateway status openclaw config get gateway.mode openclaw config get gateway.remote.url openclaw config get gateway.auth.mode ```What to check:
- If `gateway.mode=remote`, CLI calls may be targeting remote while your local service is fine.
- Explicit `--url` calls do not fall back to stored credentials.
Common signatures:
- `gateway connect failed:` → wrong URL target.
- `unauthorized` → endpoint reachable but wrong auth.
What to check:
- Non-loopback binds (`lan`, `tailnet`, `custom`) need a valid gateway auth path: shared token/password auth, or a correctly configured non-loopback `trusted-proxy` deployment.
- Old keys like `gateway.token` do not replace `gateway.auth.token`.
Common signatures:
- `refusing to bind gateway ... without auth` → non-loopback bind without a valid gateway auth path.
- `Connectivity probe: failed` while runtime is running → gateway alive but inaccessible with current auth/url.
What to check:
- Pending device approvals for dashboard/nodes.
- Pending DM pairing approvals after policy or identity changes.
Common signatures:
- `device identity required` → device auth not satisfied.
- `pairing required` → sender/device must be approved.
If the service config and runtime still disagree after checks, reinstall service metadata from the same profile/state directory:
openclaw gateway install --force
openclaw gateway restart
Related: