Back to Promptfoo

provider-model-armor (Google Cloud Model Armor)

examples/provider-model-armor/README.md

0.121.94.8 KB
Original Source

provider-model-armor (Google Cloud Model Armor)

This directory contains examples for testing Google Cloud Model Armor with Promptfoo.

You can run this example with:

bash
npx promptfoo@latest init --example provider-model-armor
cd provider-model-armor

Model Armor is a managed service that screens LLM prompts and responses for:

  • Responsible AI (RAI): Hate speech, harassment, sexually explicit, dangerous content
  • CSAM: Child safety content detection (always enabled)
  • Prompt Injection & Jailbreak: Detects manipulation attempts
  • Malicious URLs: Phishing and threat detection
  • Sensitive Data Protection (SDP): Credit cards, SSNs, API keys, etc.

Prerequisites

  1. Enable Model Armor API:

    bash
    gcloud services enable modelarmor.googleapis.com --project=YOUR_PROJECT_ID
    
  2. Grant IAM Permissions (for Vertex AI integration):

    bash
    PROJECT_NUMBER=$(gcloud projects describe YOUR_PROJECT_ID --format="value(projectNumber)")
    gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
      --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-aiplatform.iam.gserviceaccount.com" \
      --role="roles/modelarmor.user"
    
  3. Set the regional API endpoint (for direct API testing):

    bash
    gcloud config set api_endpoint_overrides/modelarmor \
      "https://modelarmor.us-central1.rep.googleapis.com/"
    
  4. Create a Model Armor template:

    bash
    gcloud model-armor templates create basic-safety \
      --location=us-central1 \
      --rai-settings-filters='[{"filterType":"HATE_SPEECH","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"HARASSMENT","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"DANGEROUS","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"SEXUALLY_EXPLICIT","confidenceLevel":"MEDIUM_AND_ABOVE"}]' \
      --pi-and-jailbreak-filter-settings-enforcement=enabled \
      --pi-and-jailbreak-filter-settings-confidence-level=medium-and-above \
      --malicious-uri-filter-settings-enforcement=enabled \
      --basic-config-filter-enforcement=enabled
    
  5. Set environment variables (for direct API testing):

    bash
    export GOOGLE_PROJECT_ID=your-project-id
    export MODEL_ARMOR_LOCATION=us-central1
    export MODEL_ARMOR_TEMPLATE=basic-safety
    export GCLOUD_ACCESS_TOKEN=$(gcloud auth print-access-token)
    

    Note: Access tokens expire after 1 hour. For CI/CD, use service account keys or Workload Identity Federation.

Examples

1. Direct Model Armor API Testing

Test Model Armor's sanitization API directly using the HTTP provider:

bash
promptfoo eval -c promptfooconfig.yaml

This example:

  • Calls the sanitizeUserPrompt API directly
  • Maps filter results to Promptfoo's guardrails format
  • Tests both benign and adversarial prompts

2. Vertex AI with Model Armor Integration

Test Gemini models with Model Armor templates:

bash
promptfoo eval -c promptfooconfig.vertex.yaml

This example:

  • Uses Vertex AI's native Model Armor integration
  • Compares models with and without Model Armor enabled
  • Uses the guardrails and not-guardrails assertion types

Configuration Files

  • promptfooconfig.yaml - Direct Model Armor API testing (recommended for detailed filter results)
  • promptfooconfig.vertex.yaml - Vertex AI integration with Model Armor (recommended for production-like testing)
  • transforms/sanitize-response.js - Response transformer for the sanitization API
  • datasets/model-armor-test.csv - Test dataset with prompts for each filter type

Using the Dataset

The included CSV dataset contains test prompts for each Model Armor filter type. Load it in your config:

yaml
tests: file://datasets/model-armor-test.csv

Each row includes a prompt and expected behavior (benign vs. adversarial).

Understanding Results

When Model Armor blocks content, you'll see:

  • guardrails.flagged: true - Content was flagged
  • guardrails.flaggedInput: true - The input prompt was blocked
  • guardrails.flaggedOutput: true - The generated response was blocked
  • guardrails.reason - Detailed explanation of which filters matched

For debugging, inspect the raw Model Armor response in metadata.modelArmor, which contains the full sanitizationResult including individual filter states and confidence levels.

Use not-guardrails to verify dangerous prompts get caught - the test passes when content is blocked, fails when it slips through.

Cleanup

After testing, you can delete the Model Armor template if no longer needed:

bash
gcloud model-armor templates delete basic-safety --location=us-central1

Learn More