Adversarial Examples

A core AI Red Teaming activity involves generating adversarial examples – inputs slightly perturbed to cause misclassification or bypass safety filters – to test model robustness. Red teamers use various techniques (gradient-based, optimization-based, or black-box methods) to find inputs that exploit model weaknesses, informing developers on how to harden the model.

Learn more from the following resources:

@guide@Adversarial Examples – Interpretable Machine Learning Book
@guide@Adversarial Testing for Generative AI
@video@How AI Can Be Tricked With Adversarial Attacks