Back to Mcpproxy Go

Security Scanner Images

docs/features/scanner-images.md

0.38.17.6 KB
Original Source

Security Scanner Images

MCPProxy's security scanners run as Docker containers. This document explains where those images come from, how they're published, and why we keep them in this repository instead of a separate one.

Image sources

ScannerImageSource
cisco-mcp-scannerghcr.io/smart-mcp-proxy/scanner-cisco:latestCustom wrapper in docker/scanners/cisco/
mcp-ai-scannerghcr.io/smart-mcp-proxy/mcp-scanner:latestBuilt from the separate smart-mcp-proxy/mcp-scanner repo (AI scanner needs its own release cycle)
mcp-scan (Snyk)ghcr.io/smart-mcp-proxy/scanner-snyk:latestCustom wrapper in docker/scanners/snyk/
nova-proximityghcr.io/smart-mcp-proxy/scanner-proximity:latestCustom wrapper in docker/scanners/proximity/
rampartsghcr.io/smart-mcp-proxy/scanner-ramparts:latestCustom wrapper in docker/scanners/ramparts/
semgrep-mcpreturntocorp/semgrep:latestVendor image — maintained upstream by Semgrep
trivy-mcpghcr.io/aquasecurity/trivy:latestVendor image — maintained upstream by Aqua Security

Rule of thumb:

  1. Prefer vendor images. Trivy and Semgrep ship their own high-quality multi-arch images. Wrapping them in our own Dockerfile would only add lag between upstream releases and MCPProxy users.
  2. When no vendor image exists, publish our own thin wrapper under ghcr.io/smart-mcp-proxy/scanner-<id>:latest. A wrapper installs the vendor CLI from PyPI / crates.io and adds an entrypoint that reads from /scan/source and writes SARIF to /scan/report/results.sarif.

Why one repository, not a separate one

An earlier idea was to keep all scanner Dockerfiles in a separate smart-mcp-proxy/scanners repo. We decided against that for three reasons:

  1. Version drift. The registry that MCPProxy ships (registry_bundled.go) and the image tags it expects must move together. Keeping the Dockerfile and the Go constant in the same commit makes that trivial; splitting them across repos means every scanner change becomes a two-repo dance with two PRs and two approvals.

  2. Single CI story. Releases already build the MCPProxy binary, the macOS DMG, and the Docker image for the server edition. Adding the scanner images to the same repository means one workflow (scanner-images.yml), one set of secrets, one place to debug.

  3. Small surface. Each wrapper is ~30 lines of Dockerfile plus a tiny entrypoint. There simply isn't enough code to justify an extra repo.

The one exception is the AI scanner (mcp-ai-scanner) — it lives in smart-mcp-proxy/mcp-scanner because the agent logic there is non-trivial and has its own release cadence. We still reference the published image name from here.

Publishing

The .github/workflows/scanner-images.yml workflow builds and pushes all four wrappers whenever:

  • A commit to main touches docker/scanners/** or the workflow itself.
  • A maintainer triggers it manually via workflow_dispatch (optionally for a single scanner id).

Pull requests run the build step but do not push, so Dockerfile drift is caught in CI without leaking images from forks.

Images are multi-arch (linux/amd64 + linux/arm64), tagged with both :latest and a short-SHA tag for pinning. Exception: the ramparts image is linux/amd64-only — its arm64 leg cold-compiles the Rust crate under QEMU and exhausts the CI runner budget (MCP-2395 / #665). On arm64 hosts (e.g. Apple Silicon) the amd64 image runs under emulation, which is functional but slower.

Ramparts (v0.8.x) — stdio replay design

Ramparts ≥ 0.8.0 dropped file/directory scanning. ramparts scan <target> now expects a live MCP endpoint (an HTTP URL or a stdio: subprocess): it performs the MCP handshake, enumerates the advertised tools, and runs its YARA rules against them. The old --format sarif --output FILE /scan/source invocation is invalid in v0.8.x (no sarif format, no --output flag, and a directory is not a valid target).

MCPProxy never re-executes the untrusted upstream just to give Ramparts a target — that would violate the "never execute scanned source" invariant (MCP-2206/#658). Instead the engine exports the tool definitions it already captured from the running upstream into /scan/source/tools.json (the same file the Cisco scanner consumes), and the entrypoint points Ramparts at a static stdio replay shim:

ramparts scan "stdio:python3:/usr/local/bin/mcp-replay.py" --format json

mcp-replay.py speaks just enough of the MCP stdio protocol to replay those captured tool definitions — it runs no upstream code. The image therefore also ships a python3 runtime, plus Ramparts' YARA rules/, taxonomies/, and default config.yaml (loaded relative to the /scan working directory; a binary-only image loads zero rules and reports no findings). The native JSON output ({security_issues, yara_results}) is parsed directly by internal/security/scanner/engine.go. LLM-backed analysis stays out of scope, so the scanner runs fully offline (NetworkReq: false).

Fail-closed (MCP-2443). The scanner must never report a broken run as a clean scan. Because the Go runner ignores the container's exit code, the entrypoint enforces this by deleting the report on any failure so the engine records the scanner as failed rather than parsing a stale/error payload as zero findings:

  • mcp-replay.py aborts (non-zero exit, serves no tools) when /scan/source/tools.json is missing, empty, not valid JSON, the wrong shape, or yields zero usable tools — instead of replaying an empty tool list that would make Ramparts emit a spurious clean report.
  • entrypoint.sh validates that the captured output is a genuine Ramparts report (a top-level security_issues object and yara_results array, the same shape isRampartsOutput() probes). An error payload, garbled JSON, or empty output marks the scan failed. A non-zero Ramparts exit accompanied by a valid report is kept (Ramparts signals findings/offline-LLM that way).

Background image pull UX

MCPProxy pulls these images lazily:

  • When the user toggles a scanner on (POST /api/v1/security/scanners/{id}/enable), the Docker pull runs in a background goroutine. The scanner status immediately moves to pulling; the web UI shows a spinner and reacts to security.scanner_changed SSE events.
  • When the pull finishes, status flips to installed (no env config) or configured (API key already set).
  • If the pull fails, status becomes error with the reason in error_message, and a "Retry" button in the Security page calls the enable endpoint again.
  • When a user changes the Docker image override via PUT /api/v1/security/scanners/{id}/config, the service detects that the effective image changed and kicks off a new background pull.
  • Scanners with a missing image are no longer silently dropped from a scan. They are recorded as a failed scanner in the scan report so users can see exactly which scanner didn't run and why.

Adding a new scanner

  1. Write a Dockerfile in docker/scanners/<id>/Dockerfile. Expect /scan/source (read-only) for input and /scan/report (writable) for output.
  2. If the scanner needs custom argv, add an entrypoint script next to it.
  3. Append an entry to registry_bundled.go pointing at ghcr.io/smart-mcp-proxy/scanner-<id>:latest.
  4. Add the matrix entry to .github/workflows/scanner-images.yml.
  5. Land the change in a single PR so all three moving parts stay in sync.