Jailbreak Techniques

Jailbreaking is a specific category of prompt hacking where the AI Red Teamer aims to bypass the LLM's safety and alignment training. They use techniques like creating fictional scenarios, asking the model to simulate an unrestricted AI, or using complex instructions to trick the model into generating content that violates its own policies (e.g., generating harmful code, hate speech, or illegal instructions).

Learn more from the following resources:

@guide@InjectPrompt
@guide@Jailbreaking Guide - Learn Prompting
@paper@Jailbroken: How Does LLM Safety Training Fail? (arXiv)