deploy/docker/SECURITY-VERIFY.md
The offline test suite (pytest deploy/docker/tests/test_security_*.py) covers
the logic of the hardening. A handful of items can only be confirmed with a real
docker build + boot or a browser. This runbook lists those, with exact
commands and expected results. Run it before relying on the hardened image.
Prereqs: Docker + docker compose, a checkout of this branch.
python -m pip install -e . && pip install -r deploy/docker/requirements.txt pytest pytest-asyncio
pytest deploy/docker/tests/test_security_*.py -q
Expected: all pass, 1 xfailed (the --no-sandbox posture test, see §3).
IMAGE=local-sec docker compose build
# or: docker build -t unclecode/crawl4ai:local-sec .
Expected: build succeeds. Note the /app dir is now root-owned + read-only and
the artifact dir /var/lib/crawl4ai/outputs is created 0700.
docker run --rm -p 11235:11235 unclecode/crawl4ai:local-sec &
sleep 8
docker logs <id> 2>&1 | grep -i "binding loopback only" # expect this line
curl -fsS http://localhost:11235/health # expect: NOT reachable
Expected: entrypoint logs "no CRAWL4AI_API_TOKEN set; binding loopback only"; the mapped port is not reachable from the host (gunicorn bound 127.0.0.1 inside the container). This is the fail-closed default.
TOKEN=$(openssl rand -hex 32)
docker run --rm -e CRAWL4AI_API_TOKEN=$TOKEN -p 11235:11235 unclecode/crawl4ai:local-sec &
sleep 8
curl -fsS http://localhost:11235/health # expect 200
curl -fsS http://localhost:11235/schema # expect 401
curl -fsS -H "Authorization: Bearer $TOKEN" http://localhost:11235/schema # expect 200
--no-sandbox)Default keeps --no-sandbox (works as today). To verify the hardened path:
Host: sysctl -w kernel.unprivileged_userns_clone=1 (Debian/Ubuntu).
docker run --rm -e CRAWL4AI_API_TOKEN=$TOKEN -e CRAWL4AI_CHROMIUM_SANDBOX=true \
-p 11235:11235 unclecode/crawl4ai:local-sec &
sleep 8
curl -fsS -X POST http://localhost:11235/crawl \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"urls":["https://example.com"]}'
Expected: a successful crawl (Chromium started sandboxed). If it fails to launch, the host lacks userns — keep the default or use Option B.
Provide a Chrome seccomp profile and wire it in compose:
security_opt:
- seccomp=./seccomp-chrome.json
then set CRAWL4AI_CHROMIUM_SANDBOX=true and re-run the crawl above.
If verified, flip the default: remove --no-sandbox from config.yml and the
test_no_no_sandbox_flag xfail in test_security_default_posture.py becomes a
normal pass.
docker compose up -d # uses read_only: true + tmpfs
sleep 8
# Writes outside tmpfs must fail:
docker compose exec crawl4ai sh -c 'echo x > /app/should_fail' ; echo "exit=$?" # expect non-zero
# Artifact write (tmpfs) must work via the API:
curl -fsS -X POST http://localhost:11235/screenshot \
-H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
-d '{"url":"https://example.com"}' | grep artifact_id
Expected: /app write fails (read-only), screenshot returns an artifact_id,
and GET /artifacts/{id} (with the token) returns the PNG.
docker compose exec crawl4ai sh -c 'redis-cli -p 6379 ping' # expect: NOAUTH / error
docker compose exec crawl4ai sh -c 'redis-cli -a "$REDIS_PASSWORD" ping' # expect: PONG
# From the host, the redis port must NOT be reachable (no EXPOSE / publish):
nc -z localhost 6379 ; echo "exit=$?" # expect non-zero
The browser is routed through the localhost pinning proxy. To confirm end to end:
# A normal public crawl works:
curl -fsS -X POST http://localhost:11235/crawl -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" -d '{"urls":["https://example.com"]}' | grep '"success": true'
# An internal target is refused up front (and the proxy would 403 a rebind):
curl -s -X POST http://localhost:11235/crawl -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" -d '{"urls":["http://169.254.169.254/"]}' # expect 400 blocked
(For a true rebinding test, point a host at a TTL-0 record that flips public-> internal and confirm the crawl never reaches the internal IP.)
Open http://localhost:11235/dashboard and /playground in a browser (send the
token). Expected today: they load and work; response headers include
X-Content-Type-Options: nosniff and X-Frame-Options: DENY, but no strict
CSP (they still use inline scripts + CDN assets). Extending the strict CSP to
these mounts requires the externalization work noted in MIGRATION.md.
/app read-only; artifacts work on tmpfs