provider-model-armor (Google Cloud Model Armor)

This directory contains examples for testing Google Cloud Model Armor with Promptfoo.

You can run this example with:

bash

npx promptfoo@latest init --example provider-model-armor
cd provider-model-armor

Model Armor is a managed service that screens LLM prompts and responses for:

Responsible AI (RAI): Hate speech, harassment, sexually explicit, dangerous content
CSAM: Child safety content detection (always enabled)
Prompt Injection & Jailbreak: Detects manipulation attempts
Malicious URLs: Phishing and threat detection
Sensitive Data Protection (SDP): Credit cards, SSNs, API keys, etc.

Prerequisites

Enable Model Armor API:

bash

gcloud services enable modelarmor.googleapis.com --project=YOUR_PROJECT_ID

Grant IAM Permissions (for Vertex AI integration):

bash

PROJECT_NUMBER=$(gcloud projects describe YOUR_PROJECT_ID --format="value(projectNumber)")
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-aiplatform.iam.gserviceaccount.com" \
  --role="roles/modelarmor.user"

Set the regional API endpoint (for direct API testing):

bash

gcloud config set api_endpoint_overrides/modelarmor \
  "https://modelarmor.us-central1.rep.googleapis.com/"

Create a Model Armor template:

bash

gcloud model-armor templates create basic-safety \
  --location=us-central1 \
  --rai-settings-filters='[{"filterType":"HATE_SPEECH","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"HARASSMENT","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"DANGEROUS","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"SEXUALLY_EXPLICIT","confidenceLevel":"MEDIUM_AND_ABOVE"}]' \
  --pi-and-jailbreak-filter-settings-enforcement=enabled \
  --pi-and-jailbreak-filter-settings-confidence-level=medium-and-above \
  --malicious-uri-filter-settings-enforcement=enabled \
  --basic-config-filter-enforcement=enabled

Set environment variables (for direct API testing):

bash

export GOOGLE_PROJECT_ID=your-project-id
export MODEL_ARMOR_LOCATION=us-central1
export MODEL_ARMOR_TEMPLATE=basic-safety
export GCLOUD_ACCESS_TOKEN=$(gcloud auth print-access-token)

Note: Access tokens expire after 1 hour. For CI/CD, use service account keys or Workload Identity Federation.

Examples

1. Direct Model Armor API Testing

Test Model Armor's sanitization API directly using the HTTP provider:

bash

promptfoo eval -c promptfooconfig.yaml

This example:

Calls the sanitizeUserPrompt API directly
Maps filter results to Promptfoo's guardrails format
Tests both benign and adversarial prompts

2. Vertex AI with Model Armor Integration

Test Gemini models with Model Armor templates:

bash

promptfoo eval -c promptfooconfig.vertex.yaml

This example:

Uses Vertex AI's native Model Armor integration
Compares models with and without Model Armor enabled
Uses the guardrails and not-guardrails assertion types

Configuration Files

promptfooconfig.yaml - Direct Model Armor API testing (recommended for detailed filter results)
promptfooconfig.vertex.yaml - Vertex AI integration with Model Armor (recommended for production-like testing)
transforms/sanitize-response.js - Response transformer for the sanitization API
datasets/model-armor-test.csv - Test dataset with prompts for each filter type

Using the Dataset

The included CSV dataset contains test prompts for each Model Armor filter type. Load it in your config:

yaml

tests: file://datasets/model-armor-test.csv

Each row includes a prompt and expected behavior (benign vs. adversarial).

Understanding Results

When Model Armor blocks content, you'll see:

guardrails.flagged: true - Content was flagged
guardrails.flaggedInput: true - The input prompt was blocked
guardrails.flaggedOutput: true - The generated response was blocked
guardrails.reason - Detailed explanation of which filters matched

For debugging, inspect the raw Model Armor response in metadata.modelArmor, which contains the full sanitizationResult including individual filter states and confidence levels.

Use not-guardrails to verify dangerous prompts get caught - the test passes when content is blocked, fails when it slips through.

Cleanup

After testing, you can delete the Model Armor template if no longer needed:

bash

gcloud model-armor templates delete basic-safety --location=us-central1

provider-model-armor (Google Cloud Model Armor)

provider-model-armor (Google Cloud Model Armor)

Prerequisites

Examples

1. Direct Model Armor API Testing

2. Vertex AI with Model Armor Integration

Configuration Files

Using the Dataset

Understanding Results

Cleanup

Learn More