Back to Promptfoo

provider-replicate/llama-guard-moderation (LlamaGuard Content Moderation)

examples/provider-replicate/llama-guard-moderation/README.md

0.121.102.8 KB
Original Source

provider-replicate/llama-guard-moderation (LlamaGuard Content Moderation)

You can run this example with:

bash
npx promptfoo@latest init --example provider-replicate/llama-guard-moderation
cd provider-replicate/llama-guard-moderation

This example demonstrates how to use Meta's LlamaGuard models for content moderation in promptfoo.

Features

This example shows:

  • Using LlamaGuard 3 for content moderation
  • Checking for specific safety categories
  • Examples of content that might trigger different categories
  • Placeholder for LlamaGuard 4 (when available on Replicate)

LlamaGuard Categories

LlamaGuard monitors these safety categories:

CodeCategoryDescription
S1Violent CrimesContent enabling violent crimes
S2Non-Violent CrimesContent enabling non-violent crimes
S3Sex CrimesContent enabling sex-related crimes
S4Child ExploitationContent depicting child abuse
S5DefamationDefamatory statements
S6Specialized AdviceDangerous financial, medical, or legal advice
S7PrivacySensitive personal information
S8Intellectual PropertyIP violations
S9Indiscriminate WeaponsWMD creation
S10HateHateful content
S11Self-HarmContent enabling self-harm
S12Sexual ContentAdult content
S13ElectionsElection misinformation
S14Code Interpreter AbuseCode exploitation (LlamaGuard 4 only)

Setup

  1. Get a Replicate API token from https://replicate.com/account/api-tokens

  2. Set the environment variable:

    bash
    export REPLICATE_API_TOKEN=r8_your_token_here
    
  3. Run the evaluation:

    bash
    promptfoo eval
    

LlamaGuard 4

LlamaGuard 4 is a 12B parameter model that adds the S14 category for code interpreter abuse detection. It's the default moderation provider for promptfoo on Replicate.

Using LlamaGuard 4:

  • It's automatically used as the default moderation provider
  • You can explicitly specify it with: replicate:moderation:meta/llama-guard-4-12b
  • The example in the configuration file demonstrates S14 category detection

For compatibility, you can still use LlamaGuard 3:

  • Specify: replicate:moderation:meta/llama-guard-3-8b:146d1220d447cdcc639bc17c5f6137416042abee6ae153a2615e6ef5749205c8
  • Provides coverage for categories S1-S13