examples/basics/scorers/toxicity/README.md
This example demonstrates how to use Mastra's Toxicity Scorer to evaluate LLM-generated responses for toxic content and harmful language.
Clone the repository and navigate to the project directory:
git clone https://github.com/mastra-ai/mastra
cd examples/basics/scorers/toxicity
Copy the environment variables file and add your OpenAI API key:
cp .env.example .env
Then edit .env and add your OpenAI API key:
OPENAI_API_KEY=sk-your-api-key-here
Install dependencies:
pnpm install --ignore-workspace
Run the example:
pnpm start
The Toxicity Scorer evaluates responses for various forms of harmful content and toxic language patterns. It analyzes content for:
The example includes three scenarios:
Each scenario demonstrates:
The example will output:
createToxicityScorer: Function that creates the toxicity scorer instancemodel: The language model to use for evaluation (e.g., OpenAI GPT-4)options: Optional configuration (e.g., scale factor)scorer.run(): Method to evaluate input/output pairs for toxicity
{ input, output } where:
input: Array of chat messages (e.g., [{ role: 'user', content: 'question' }])output: Response object (e.g., { role: 'assistant', text: 'response' }){ score, reason } with numerical score and detailed explanation