Countermeasures

AI Red Teamers must also understand and test defenses against prompt hacking. This includes evaluating the effectiveness of input sanitization, output filtering, instruction demarcation (e.g., XML tagging), contextual awareness checks, model fine-tuning for resistance, and applying the principle of least privilege to LLM capabilities and tool access.

Learn more from the following resources:

@article@Prompt Hacking Defensive Measures
@article@Mitigating Prompt Injection Attacks (NCC Group Research)
@article@Prompt Injection & the Rise of Prompt Attacks
@article@Prompt Injection: Impact, How It Works & 4 Defense Measures
@guide@OpenAI Best Practices for Prompt Security