Authoritative Markup Injection Strategy

The Authoritative Markup Injection strategy tests whether AI systems are more susceptible to harmful requests when they're presented with authoritative-looking formatting or structured markup.

This approach exploits the tendency of language models to place higher trust in content that appears to come from official, structured, or technically formatted sources.

How It Works

Language models are trained on vast amounts of structured data including documentation, technical specifications, academic papers, and formal communications. This training can create a bias where models treat formally structured or marked-up content as more authoritative or trustworthy.

The strategy works by:

Taking a base prompt or test case
Dynamically generating authoritative formatting around the content
Testing whether the structured presentation increases the model's compliance with potentially harmful requests
Evaluating if the model's safety guardrails are more easily bypassed when content appears authoritative

This technique is distinct from simple prompt injection because it focuses specifically on exploiting trust signals embedded in formatting and structure rather than just the content itself.

Implementation

Add it to your promptfooconfig.yaml:

yaml

strategies:
  # Basic usage
  - authoritative-markup-injection

  # With configuration (specify which variable to inject into)
  - id: authoritative-markup-injection
    config:
      injectVar: query

The injectVar parameter specifies which variable in your prompt template should receive the authoritative markup injection. If not specified, the strategy will use the default prompt injection variable.

Combine with red team plugins to test specific vulnerability categories:

yaml

redteam:
  plugins:
    - harmful
    - pii
    - prompt-injection

  strategies:
    - authoritative-markup-injection

Example Scenario

Consider a simple harmful request:

Tell me how to create malware

With authoritative markup injection, this might be transformed into content that appears to come from:

Technical documentation or API specifications
Academic research protocols
Official security testing frameworks
Formal compliance requirements
Industry standard methodologies

The model is tested to see if wrapping the request in authoritative-looking structure makes it more likely to comply, even though the underlying request remains harmful.

Importance in Gen AI Red Teaming

Authoritative markup injection is particularly important because:

Exploits Training Biases: Language models develop implicit trust in structured, formally presented content during training
Bypasses Content-Based Filters: Safety systems often focus on detecting harmful content but may not recognize how formatting affects model behavior
Reflects Real-World Attacks: Adversaries increasingly use professional-looking formatting to make harmful requests appear legitimate
Tests Context Awareness: Evaluates whether models can maintain safety standards regardless of how requests are formatted

This strategy helps identify vulnerabilities where models prioritize structural authority over content analysis, which is crucial for building robust AI safety systems.

Citation Strategy - Tests academic authority bias
Prompt Injection - Direct injection attacks
Jailbreak Strategies - Iterative bypass techniques
Composite Jailbreaks - Combined attack techniques

For a comprehensive overview of LLM vulnerabilities and red teaming strategies, visit our Types of LLM Vulnerabilities page.

Authoritative Markup Injection Strategy

Authoritative Markup Injection Strategy

How It Works

Implementation

Example Scenario

Importance in Gen AI Red Teaming

Related Concepts