.agents/skills/crabbox/SKILL.md
Use the Crabbox wrapper when OpenClaw needs remote Linux proof for broad tests, CI-parity checks, secrets, hosted services, Docker/E2E/package lanes, warmed reusable boxes, sync timing, logs/results, cache inspection, or lease cleanup.
Crabbox is the transport/orchestration surface. The actual backend can be:
provider=aws, lease ids like
cbx_..., syncDelegated=falseprovider=blacksmith-testbox, ids like tbx_..., syncDelegated=trueFor OpenClaw maintainer broad pnpm gates, Blacksmith Testbox through the
Crabbox wrapper is acceptable and often preferred when the standing Testbox
rules apply. Do not describe those runs as "AWS Crabbox"; report them as
Testbox-through-Crabbox with the tbx_... id and Actions run.
Use the repo .crabbox.yaml brokered AWS path when the task specifically needs
direct AWS Crabbox behavior, persistent direct-provider leases, --fresh-pr,
--full-resync, environment forwarding, capture/download support, or provider
comparison. Use --provider blacksmith-testbox when the task needs OpenClaw
maintainer Testbox proof, prepared CI environment, broad/heavy pnpm gates, or
the user asks for Testbox/Blacksmith.
command -v crabbox
../crabbox/bin/crabbox --version
pnpm crabbox:run -- --help | sed -n '1,120p'
../crabbox/bin/crabbox desktop launch --help
../crabbox/bin/crabbox webvnc --help
../crabbox/bin/crabbox when present. The user PATH
shim can be stale..crabbox.yaml for direct-provider defaults. Omitting --provider
means brokered AWS today.pnpm gates, prefer the repo wrapper with
--provider blacksmith-testbox or the repo Testbox helpers when the standing
Testbox policy applies.cbx_... means AWS Crabbox;
tbx_... means Blacksmith Testbox through Crabbox. If the output only says
blacksmith testbox list, use blacksmith testbox list --all before
concluding no box exists.--full-resync
(alias --fresh-sync) before replacing the lease. This resets the remote
workdir, skips the fingerprint fast path, reseeds Git when possible, and
uploads the checkout from scratch.OPENCLAW_LOCAL_CHECK_MODE=throttled from the local shell is not permission
to move broad pnpm check:changed, pnpm test:changed, full pnpm test, or
lint/typecheck fan-out onto the laptop.OPENCLAW_LOCAL_CHECK_MODE=throttled|full when the user explicitly
asks for local proof in the current task. If Testbox is queued or capacity is
constrained, report the blocker and keep only targeted local edit-loop checks
running.Use these only when the task needs an existing non-Linux host. OpenClaw broad Linux validation uses the repo Crabbox config unless a provider is explicitly requested.
When the user explicitly asks for brokered macOS runners, use Crabbox AWS
macOS only after confirming the deployed coordinator supports EC2 Mac host
lifecycle/image routes and the operator has AWS EC2 Mac Dedicated Host quota
and IAM. Prefer CRABBOX_HOST_ID for a known Crabbox-managed Dedicated Host,
or run the no-spend preflight first:
crabbox admin hosts quota --provider aws --target macos --region eu-west-1 --type mac2.metal --json
crabbox admin hosts allocate --provider aws --target macos --region eu-west-1 --type mac2.metal --dry-run --json
CRABBOX_MACOS_TYPES=all scripts/macos-host-region-preflight.sh
Do not silently substitute AWS macOS for normal OpenClaw Linux proof. Report paid-host blockers as quota, IAM, coordinator deployment, or host availability instead of falling back to local macOS.
Crabbox supports static SSH targets:
../crabbox/bin/crabbox run --provider ssh --target macos --static-host mac-studio.local -- xcodebuild test
../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode normal --static-host win-dev.local -- pwsh -NoProfile -Command "dotnet test"
../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local -- pnpm test
target=macos and target=windows --windows-mode wsl2 use the POSIX SSH,
bash, Git, rsync, and tar contract.static.workRoot. Direct native Windows runs support
--script*, --env-from-profile, --preflight, and PowerShell --shell.crabbox actions hydrate/register are Linux-only today; use plain
crabbox run loops for static macOS and Windows hosts.../crabbox/bin/crabbox run --help, config/flag tests, and the Crabbox
Go test suite.Use this when the task needs direct AWS Crabbox semantics rather than the prepared Blacksmith Testbox CI environment.
Changed gate:
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
Full suite:
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test"
Focused rerun:
pnpm crabbox:run -- \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
--shell -- \
"env CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test <path-or-filter>"
Read the JSON summary. Useful fields:
provider: awsleaseId: cbx_...syncDelegated: falsecommandPhases: populated when the command prints CRABBOX_PHASE:<name>commandMs / totalMsexitCodeCrabbox should stop one-shot AWS leases automatically after the run. Verify cleanup when a run fails, is interrupted, or the command output is unclear:
../crabbox/bin/crabbox list --provider aws
Use this for OpenClaw maintainer broad/heavy pnpm gates when the prepared CI
environment is the right proof surface:
node scripts/crabbox-wrapper.mjs run \
--provider blacksmith-testbox \
--blacksmith-org openclaw \
--blacksmith-workflow .github/workflows/ci-check-testbox.yml \
--blacksmith-job check \
--blacksmith-ref main \
--idle-timeout 90m \
--ttl 240m \
--timing-json \
-- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 OPENCLAW_TESTBOX=1 OPENCLAW_TESTBOX_REMOTE_RUN=1 pnpm check:changed
Read the JSON summary and the Testbox line. Useful fields:
provider: blacksmith-testboxleaseId: tbx_...syncDelegated: truesyncPhases: delegated/skipped because Blacksmith owns checkout/syncexitCodeblacksmith testbox list may hide hydrating or ready boxes. Use:
blacksmith testbox list --all
blacksmith testbox status <tbx_id>
Use these on debugging runs before inventing ad hoc logging:
--preflight: prints run context, workspace mode, SSH target, remote user/cwd,
and target-specific tool probes. Defaults cover git, tar, node, npm,
corepack, pnpm, yarn, bun, docker, plus POSIX
sudo/apt/bubblewrap and native Windows
powershell/execution_policy/longpaths/temp/pwsh. Add
--preflight-tools node,bun,docker, CRABBOX_PREFLIGHT_TOOLS, or repo
run.preflightTools to replace the list. default expands built-ins; none
prints only the workspace summary. Preflight is diagnostic only; install
toolchains through Actions hydration, images, devcontainer/Nix/mise/asdf, or
the run script. On blacksmith-testbox, this prints a delegated-unsupported
note because the workflow owns setup.CRABBOX_ENV_ALLOW=NAME,...: forwards only listed local env vars for direct
providers and prints set len=N secret=true style summaries. On
blacksmith-testbox, env forwarding is unsupported; put secrets in the
Testbox workflow instead.--env-from-profile <file> plus --allow-env NAME: loads simple
export NAME=value / NAME=value lines from a local profile without
executing it, then forwards only allowlisted names. --allow-env is
repeatable and comma-separated. Profile values override ambient allowlisted
env values for that run. Direct POSIX, WSL2, and native Windows runs are
supported; delegated providers are not. Crabbox probes the uploaded profile
remotely and prints redacted presence/length metadata before the command.--env-helper <name>: with --env-from-profile on POSIX SSH targets,
persists .crabbox/env/<name> and .crabbox/env/<name>.env so follow-up
commands on the same lease can run through ./.crabbox/env/<name> <command>.
Use only on leases you control; the profile stays until cleanup, lease reset,
or --full-resync.--script <file> / --script-stdin: upload a local script into
.crabbox/scripts/ and execute it on the remote box. Shebang scripts execute
directly on POSIX; scripts without a shebang run through bash. Native
Windows uploads run through Windows PowerShell, and Crabbox appends .ps1
when needed. Arguments after -- become script args.--fresh-pr owner/repo#123|URL|number: skip dirty local sync and create a
fresh remote checkout of the GitHub PR. Bare numbers use the current repo's
GitHub origin. Add --apply-local-patch only when the current local
git diff --binary HEAD should be applied on top of that PR checkout.--full-resync / --fresh-sync: reset a stale direct-provider workdir
before syncing. Use after sync fingerprints look wrong, SSH times out before
sync, or rsync watchdog output suggests it. It is redundant with
--fresh-pr, incompatible with --no-sync, and unsupported by delegated
providers.--capture-stdout <path> / --capture-stderr <path>: write remote streams to
local files and keep binary/noisy output out of retained logs. Parent
directories must already exist. These are direct-provider only.--capture-on-fail: on non-zero direct-provider exits, downloads
.crabbox/captures/*.tar.gz with test-results, playwright-report,
coverage, JUnit XML, and nearby logs. Treat as secret-bearing until reviewed.--keep-on-failure: leave a failed one-shot lease alive for live debugging
until idle/TTL expiry. Useful on direct providers and delegated one-shots.--timing-json: final machine-readable timing. Add
echo CRABBOX_PHASE:install, CRABBOX_PHASE:test, etc. in long shell
commands; direct providers and Blacksmith Testbox both report them as
commandPhases.Live-provider debug template for direct AWS/Hetzner leases:
mkdir -p .crabbox/logs
pnpm crabbox:run -- --provider aws \
--preflight \
--allow-env OPENAI_API_KEY,OPENAI_BASE_URL \
--timing-json \
--capture-stdout .crabbox/logs/live-provider.stdout.log \
--capture-stderr .crabbox/logs/live-provider.stderr.log \
--capture-on-fail \
--shell -- \
"echo CRABBOX_PHASE:install; pnpm install --frozen-lockfile; echo CRABBOX_PHASE:test; pnpm test:live"
Do not pass --capture-*, --download, --checksum, --force-sync-large, or
--sync-only to delegated providers. Also do not pass --script*,
--fresh-pr, --full-resync, or --env-helper there. Crabbox rejects these
because the provider owns sync or command transport. --keep-on-failure is OK
for delegated one-shots when you need to inspect a failed lease.
Use the smallest Crabbox lane that proves the reported user path, not just the touched code. Aim for one after-fix E2E proof before commenting, closing, or opening a PR for a user-visible bug.
When the user says "test in Crabbox", do not simply copy tests to the remote box and run them there. Crabbox is for remote real-scenario proof: copy or install OpenClaw as the user would, run the same setup/update/CLI/Gateway/API call that failed, and capture behavior from that entrypoint. For regressions or bug reports, prove the broken state first when feasible, then run the same scenario after the fix.
Pick the lane by symptom:
scripts/e2e/*-docker.sh or package script. This proves npm packaging,
install paths, runtime deps, config writes, and container behavior.Efficient flow:
Keep it efficient:
--script <file> or --script-stdin for multi-line E2E commands instead
of quote-heavy --shell strings on direct SSH providers.--fresh-pr <pr> when validating an upstream PR in isolation from the
local dirty tree. Add --apply-local-patch only when testing a local fixup on
top of that PR.--full-resync before replacing a warmed direct-provider lease when the
remote workdir or sync fingerprint appears stale.OPENCLAW_CURRENT_PACKAGE_TGZ with Docker/package lanes when testing a
candidate tarball; prefer the repo's package helper instead of direct source
execution when the bug might be packaging/install related.--timing-json on broad or flaky runs when command duration or sync
behavior matters.Before/after PR proof on delegated Testbox:
/tmp, install in each, then run the same harness twice.git checkout can abort or mix
proof state./tmp do not resolve repo packages by default. Put
the harness inside the worktree, or in ESM use
createRequire(path.join(process.cwd(), "package.json")) before requiring
workspace deps such as @lydell/node-pty.pnpm check:changed, pnpm check:test-types, and the real E2E proof, it is
reasonable to merge once required checks allow it. Note any still-running
unrelated shards in the proof comment instead of waiting forever.Interactive CLI/onboarding:
tmux send-keys; capture proof with
tmux capture-pane, redacted through sed.send-keys -l openai may not trigger filtering in a
tmux pane; inspect option order locally or on-box and send exact Down/Enter
sequences.OPENCLAW_STATE_DIR=$(mktemp -d). Plugin npm
installs live under that state dir (npm/node_modules/...), not under
OPENCLAW_CONFIG_DIR. Verify downloads by checking the state dir, package
lock, and installed package metadata.OPENCLAW_ALLOW_PLUGIN_INSTALL_OVERRIDES=1 plus
OPENCLAW_PLUGIN_INSTALL_OVERRIDES='{"plugin-id":"npm-pack:/tmp/plugin.tgz"}'.
Pack with npm pack, set an isolated OPENCLAW_STATE_DIR, and verify the
package under npm/node_modules. Overrides are test-only and must not be
treated as official/trusted-source installs.Installed Codex plugin, npm/node_modules/@openclaw/codex, and the
package-lock entry showing the bundled @openai/codex dependency. A dummy
OpenAI-shaped key can prove only UI/install behavior; it is not live auth.For most Crabbox calls, one-shot is enough. Use reuse only when you need multiple manual commands on the same hydrated box.
If Crabbox returns a reusable id or you intentionally keep a lease:
pnpm crabbox:run -- --id <cbx_id-or-slug> --no-sync --timing-json --shell -- "pnpm test <path>"
Stop boxes you created before handoff:
pnpm crabbox:stop -- <id-or-slug>
blacksmith testbox stop --id <tbx_id>
Prefer WebVNC for human inspection because the browser portal can preload the
lease VNC password and avoids a native VNC client's copy/paste/password dance.
Use native crabbox vnc only when WebVNC is unavailable, the browser portal is
broken, or the user explicitly wants a local VNC client.
Common desktop flow:
../crabbox/bin/crabbox warmup --provider hetzner --desktop --browser --class standard --idle-timeout 60m --ttl 240m
../crabbox/bin/crabbox desktop launch --provider hetzner --id <cbx_id-or-slug> --browser --url https://example.com --webvnc --open --take-control
Useful WebVNC commands:
../crabbox/bin/crabbox webvnc --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox webvnc daemon start --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox webvnc daemon status --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc daemon stop --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc status --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox webvnc reset --provider hetzner --id <cbx_id-or-slug> --open --take-control
../crabbox/bin/crabbox desktop doctor --provider hetzner --id <cbx_id-or-slug>
../crabbox/bin/crabbox desktop click --provider hetzner --id <cbx_id-or-slug> --x 640 --y 420
../crabbox/bin/crabbox desktop paste --provider hetzner --id <cbx_id-or-slug> --text "[email protected]"
../crabbox/bin/crabbox desktop key --provider hetzner --id <cbx_id-or-slug> ctrl+l
../crabbox/bin/crabbox artifacts collect --id <cbx_id-or-slug> --all --output artifacts/<slug>
../crabbox/bin/crabbox artifacts publish --dir artifacts/<slug> --pr <number>
desktop launch --webvnc --open is usually the nicest one-shot: it starts the
browser/app inside the visible session, bridges the lease into the authenticated
WebVNC portal, and opens the portal. Keep browsers windowed for human QA; use
--fullscreen only for capture/video workflows.
For human handoff, include --take-control so the opened portal viewer gets
keyboard/mouse control automatically instead of landing as an observer.
Human handoff preflight:
PATH.~.command -v <expected-command> && <expected-command> --version
check over a repo-root-only pnpm ... command.Generic handoff repair pattern:
../crabbox/bin/crabbox run --id <cbx_id-or-slug> --full-resync --shell -- \
"set -euo pipefail
pnpm install --frozen-lockfile
pnpm build
sudo ln -sf \"\$PWD/<cli-entry>\" /usr/local/bin/<expected-command>
cd ~
command -v <expected-command>
<expected-command> --version"
Keep the fallback narrow. First decide whether the failure is Crabbox itself, the brokered AWS lease, Blacksmith/Testbox, repo hydration, sync, or the test command.
Fast checks:
command -v crabbox
../crabbox/bin/crabbox --version
pnpm crabbox:run -- --help | sed -n '1,140p'
../crabbox/bin/crabbox doctor
command -v blacksmith
blacksmith --version
blacksmith testbox list
Common Crabbox-only failures:
../crabbox/bin/crabbox from the sibling
repo, or update/install Crabbox before retrying..crabbox.yaml, crabbox config show, and
crabbox whoami; normal OpenClaw proof should use brokered AWS without
asking for cloud keys.cbx_... / tbx_... id, or run one-shot
without --id.--debug --timing-json; capture the final JSON and the
printed Actions URL. Large sync warnings now include top source directories
by file count and a hint to update .crabboxignore / sync.exclude; inspect
those before reaching for --force-sync-large. Quiet rsync watchdogs and SSH
timeouts now print next_action= hints; follow them, usually --full-resync
first and a fresh lease second.crabbox list --provider aws; for explicit
Blacksmith runs, use blacksmith testbox list and stop only boxes you
created.--provider so .crabbox.yaml routes to brokered AWS, or report
the Blacksmith blocker if Testbox itself is the requested proof.If brokered AWS cannot dispatch, sync, attach, or stop, retry once with
--debug and --timing-json:
pnpm crabbox:run -- --debug --timing-json -- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed
Full suite:
pnpm crabbox:run -- --debug --timing-json -- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test
Auth fallback, only when blacksmith says auth is missing:
blacksmith auth login --non-interactive --organization openclaw
Raw Blacksmith footguns:
tbx_... id in the session.warmup --ref refs; use a branch or tag.blacksmith testbox list as cleanup diagnostics, not a shared reusable
queue.Use Blacksmith only when the task is specifically about Testbox, brokered AWS is unavailable, or an explicit comparison is needed. If Blacksmith is down or quota-limited, do not keep probing it; stay on brokered AWS and note the delegated-provider outage.
Crabbox Blacksmith backend delegates setup to:
openclaw.github/workflows/ci-check-testbox.ymlcheckmain unless testing a branch/tag intentionallyThe hydration workflow owns checkout, Node/pnpm setup, dependency install, secrets, ready marker, and keepalive. Crabbox owns dispatch, sync, SSH command execution, timing, logs/results, and cleanup.
Minimal Blacksmith-backed Crabbox run, from repo root:
pnpm crabbox:run -- --provider blacksmith-testbox --timing-json -- \
CI=1 NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test:changed
Use direct Blacksmith only when Crabbox is the broken layer and you are
isolating a Crabbox bug. Prefer direct blacksmith testbox list for cleanup
diagnostics, not as a reusable work queue.
Important Blacksmith footguns:
warmup --ref refs; use a branch or tag.blacksmith auth login --non-interactive --organization openclaw
Use AWS for normal OpenClaw remote proof. The repo .crabbox.yaml already
selects brokered AWS, so omit --provider unless you are testing a different
provider deliberately.
pnpm crabbox:warmup -- --class beast --market on-demand --idle-timeout 90m
pnpm crabbox:hydrate -- --id <cbx_id-or-slug>
pnpm crabbox:run -- --id <cbx_id-or-slug> --timing-json --shell -- "env NODE_OPTIONS=--max-old-space-size=4096 OPENCLAW_TEST_PROJECTS_PARALLEL=6 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 pnpm test:changed"
pnpm crabbox:stop -- <cbx_id-or-slug>
Install/auth for owned Crabbox if needed:
brew install openclaw/tap/crabbox
crabbox login --url https://crabbox.openclaw.ai --provider aws
New users should self-resolve broker auth before anyone asks for AWS keys:
crabbox config show
crabbox doctor
crabbox whoami
crabbox login --url https://crabbox.openclaw.ai --provider aws.AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, or AWS
profile setup during normal OpenClaw validation, assume the agent selected
the wrong path. Use brokered crabbox login or an existing brokered lease
before asking the user for cloud credentials.printf '%s' "$CRABBOX_COORDINATOR_TOKEN" | crabbox login --url https://crabbox.openclaw.ai --provider aws --token-stdin.macOS config lives at:
~/Library/Application Support/crabbox/config.yaml
It should include broker.url, broker.token, and usually provider: aws
for OpenClaw lanes. Let that config drive normal validation.
For human desktop demos, prefer webvnc over native vnc and keep the remote
desktop visible/windowed. Do not fullscreen the remote browser or hide the XFCE
panel/window chrome unless the explicit goal is video/capture output. After
launch, verify a screenshot shows the desktop panel plus browser title bar. If
Chrome is fullscreen, toggle it back with:
crabbox run --id <lease> --shell -- 'DISPLAY=:99 xdotool search --onlyvisible --class google-chrome windowactivate key F11'
crabbox status --id <id-or-slug> --wait
crabbox inspect --id <id-or-slug> --json
crabbox sync-plan
crabbox history --limit 20
crabbox history --lease <id-or-slug>
crabbox attach <run_id>
crabbox events <run_id> --json
crabbox logs <run_id>
crabbox results <run_id>
crabbox cache stats --id <id-or-slug>
crabbox ssh --id <id-or-slug>
blacksmith testbox list
Use --debug on run when measuring sync timing.
Use --timing-json on warmup, hydrate, and run when comparing backends.
Use --market spot|on-demand only on AWS warmup/one-shot runs.
../crabbox/bin/crabbox --help lists
the provider selected by .crabbox.yaml; update Crabbox before falling back.--debug; check changed-file count and whether the
checkout is dirty.crabbox list --provider aws; for explicit Blacksmith
runs, use blacksmith testbox list and stop owned tbx_... leases you
created.Do not add OpenClaw-specific setup to Crabbox itself. Put repo setup in the hydration workflow and keep Crabbox generic around lease, sync, command execution, logs/results, timing, and cleanup.