strix/skills/custom/source_aware_sast.md
Use this skill for source-heavy analysis where static and structural signals should guide dynamic testing.
Run tools from repo root and store outputs in a dedicated artifact directory:
mkdir -p /workspace/.strix-source-aware
Run this baseline once per repository before deep narrowing:
ART=/workspace/.strix-source-aware
mkdir -p "$ART"
semgrep scan --config p/default --config p/golang --config p/secrets \
--metrics=off --json --output "$ART/semgrep.json" .
# Build deterministic AST targets from semgrep scope (no hardcoded path guessing)
python3 - <<'PY'
import json
from pathlib import Path
art = Path("/workspace/.strix-source-aware")
semgrep_json = art / "semgrep.json"
targets_file = art / "sg-targets.txt"
try:
data = json.loads(semgrep_json.read_text(encoding="utf-8"))
except Exception:
targets_file.write_text("", encoding="utf-8")
raise
scanned = data.get("paths", {}).get("scanned") or []
if not scanned:
scanned = sorted(
{
r.get("path")
for r in data.get("results", [])
if isinstance(r, dict) and isinstance(r.get("path"), str) and r.get("path")
}
)
bounded = scanned[:4000]
targets_file.write_text("".join(f"{p}\n" for p in bounded), encoding="utf-8")
print(f"sg-targets: {len(bounded)}")
PY
xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream < "$ART/sg-targets.txt" \
> "$ART/ast-grep.json" 2> "$ART/ast-grep.log" || true
gitleaks detect --source . --report-format json --report-path "$ART/gitleaks.json" || true
trufflehog filesystem --no-update --json --no-verification . > "$ART/trufflehog.json" || true
# Keep trivy focused on vuln/misconfig (secrets already covered above) and increase timeout for large repos
trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
--format json --output "$ART/trivy-fs.json" . || true
Use Semgrep as the default static triage pass:
# Preferred deterministic profile set (works with --metrics=off)
semgrep scan --config p/default --config p/golang --config p/secrets \
--metrics=off --json --output /workspace/.strix-source-aware/semgrep.json .
# If you choose auto config, do not combine it with --metrics=off
semgrep scan --config auto --json --output /workspace/.strix-source-aware/semgrep-auto.json .
If diff scope is active, restrict to changed files first, then expand only when needed.
Use sg for structure-aware code hunting:
# Ruleless structural pass over deterministic target list (no sgconfig.yml required)
xargs -r -n 200 sg run --pattern '$F($$$ARGS)' --json=stream \
< /workspace/.strix-source-aware/sg-targets.txt \
> /workspace/.strix-source-aware/ast-grep.json 2> /workspace/.strix-source-aware/ast-grep.log || true
Target high-value patterns such as:
Use tree-sitter CLI for syntax-aware parsing when grep-level mapping is noisy:
tree-sitter parse -q <file>
Use outputs to improve route/symbol/sink maps for subsequent targeted scans.
Detect hardcoded credentials:
gitleaks detect --source . --report-format json --report-path /workspace/.strix-source-aware/gitleaks.json
trufflehog filesystem --json . > /workspace/.strix-source-aware/trufflehog.json
Run repository-wide dependency and config checks:
trivy fs --scanners vuln,misconfig --timeout 30m --offline-scan \
--format json --output /workspace/.strix-source-aware/trivy-fs.json . || true
For frontends and Node services, layer these on top of the language-agnostic passes above:
retire --path . --outputformat json --outputpath /workspace/.strix-source-aware/retire.json || true
eslint --no-config-lookup --rule '{"no-eval":2,"no-implied-eval":2}' \
-f json -o /workspace/.strix-source-aware/eslint.json . || true
When you hit a minified bundle, run js-beautify <file> for a readable
view before greppping — and use jshint --reporter=unix <file> as a
lighter syntax/anti-pattern check when ESLint is over-eager. The
JS-Snooper / jsniper.sh tools (in katana.md) are the right next
step to mine those bundles for endpoint candidates.