examples/provider-model-armor/README.md
This directory contains examples for testing Google Cloud Model Armor with Promptfoo.
You can run this example with:
npx promptfoo@latest init --example provider-model-armor
cd provider-model-armor
Model Armor is a managed service that screens LLM prompts and responses for:
Enable Model Armor API:
gcloud services enable modelarmor.googleapis.com --project=YOUR_PROJECT_ID
Grant IAM Permissions (for Vertex AI integration):
PROJECT_NUMBER=$(gcloud projects describe YOUR_PROJECT_ID --format="value(projectNumber)")
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-aiplatform.iam.gserviceaccount.com" \
--role="roles/modelarmor.user"
Set the regional API endpoint (for direct API testing):
gcloud config set api_endpoint_overrides/modelarmor \
"https://modelarmor.us-central1.rep.googleapis.com/"
Create a Model Armor template:
gcloud model-armor templates create basic-safety \
--location=us-central1 \
--rai-settings-filters='[{"filterType":"HATE_SPEECH","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"HARASSMENT","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"DANGEROUS","confidenceLevel":"MEDIUM_AND_ABOVE"},{"filterType":"SEXUALLY_EXPLICIT","confidenceLevel":"MEDIUM_AND_ABOVE"}]' \
--pi-and-jailbreak-filter-settings-enforcement=enabled \
--pi-and-jailbreak-filter-settings-confidence-level=medium-and-above \
--malicious-uri-filter-settings-enforcement=enabled \
--basic-config-filter-enforcement=enabled
Set environment variables (for direct API testing):
export GOOGLE_PROJECT_ID=your-project-id
export MODEL_ARMOR_LOCATION=us-central1
export MODEL_ARMOR_TEMPLATE=basic-safety
export GCLOUD_ACCESS_TOKEN=$(gcloud auth print-access-token)
Note: Access tokens expire after 1 hour. For CI/CD, use service account keys or Workload Identity Federation.
Test Model Armor's sanitization API directly using the HTTP provider:
promptfoo eval -c promptfooconfig.yaml
This example:
sanitizeUserPrompt API directlyTest Gemini models with Model Armor templates:
promptfoo eval -c promptfooconfig.vertex.yaml
This example:
guardrails and not-guardrails assertion typespromptfooconfig.yaml - Direct Model Armor API testing (recommended for detailed filter results)promptfooconfig.vertex.yaml - Vertex AI integration with Model Armor (recommended for production-like testing)transforms/sanitize-response.js - Response transformer for the sanitization APIdatasets/model-armor-test.csv - Test dataset with prompts for each filter typeThe included CSV dataset contains test prompts for each Model Armor filter type. Load it in your config:
tests: file://datasets/model-armor-test.csv
Each row includes a prompt and expected behavior (benign vs. adversarial).
When Model Armor blocks content, you'll see:
guardrails.flagged: true - Content was flaggedguardrails.flaggedInput: true - The input prompt was blockedguardrails.flaggedOutput: true - The generated response was blockedguardrails.reason - Detailed explanation of which filters matchedFor debugging, inspect the raw Model Armor response in metadata.modelArmor, which contains the full sanitizationResult including individual filter states and confidence levels.
Use not-guardrails to verify dangerous prompts get caught - the test passes when content is blocked, fails when it slips through.
After testing, you can delete the Model Armor template if no longer needed:
gcloud model-armor templates delete basic-safety --location=us-central1