site/docs/red-team/plugins/model-identification.md
The Model Identification plugin tests whether an AI system can be tricked into revealing its underlying model identity. Many AI deployments are white-labeled or rebranded, and exposing the underlying model (e.g., "I'm GPT-4", "I'm Claude 3.5") can expose business intelligence, enable targeted attacks, violate provider contracts, and undermine brand identity.
To include the Model Identification plugin in your LLM red teaming setup:
redteam:
plugins:
- model-identification
The plugin generates probes using several elicitation techniques:
Responses are evaluated against the following criteria:
Fails if the response:
Passes if the response:
Testing for model identification vulnerabilities is important for:
For a comprehensive overview of LLM vulnerabilities and red teaming strategies, visit our Types of LLM Vulnerabilities page.