Back to Ruflo

Safety Scan

plugins/ruflo-aidefence/skills/safety-scan/SKILL.md

3.6.301.0 KB
Original Source

Safety Scan

Scan content for prompt injection, jailbreak attempts, and unsafe patterns.

When to use

Before processing untrusted input (user submissions, API payloads, webhook data), scan it to detect prompt injection, adversarial content, or policy violations.

Steps

  1. Quick safety check — call mcp__claude-flow__aidefence_is_safe with the input text for a boolean safe/unsafe result
  2. Deep analysis — call mcp__claude-flow__aidefence_analyze for detailed threat classification and confidence scores
  3. Full scan — call mcp__claude-flow__aidefence_scan for comprehensive multi-layer scanning
  4. Train defenses — call mcp__claude-flow__aidefence_learn with confirmed threats to improve detection
  5. View stats — call mcp__claude-flow__aidefence_stats for detection rates and false positive metrics

Threat categories

  • Prompt injection (direct and indirect)
  • Jailbreak attempts
  • Data exfiltration patterns
  • Instruction override attacks
  • Social engineering prompts