provider-replicate/llama4-scout (Replicate Llama 4 Scout)

You can run this example with:

bash

npx promptfoo@latest init --example provider-replicate/llama4-scout
cd provider-replicate/llama4-scout

This example demonstrates how to use Replicate to run the new Llama 4 Scout model, a cutting-edge 17 billion parameter model with 16 experts using mixture-of-experts architecture.

About Llama 4 Scout

Llama 4 Scout is part of the Llama 4 collection of natively multimodal AI models. Key features:

17 billion parameters with 16 experts
Mixture-of-experts architecture for enhanced performance
Natively multimodal - enables text and multimodal experiences
Industry-leading performance in text and image understanding

Environment Variables

This example requires the following environment variable:

REPLICATE_API_TOKEN - Your Replicate API key (get one at https://replicate.com/account/api-tokens)

You can set this in a .env file or directly in your environment:

bash

export REPLICATE_API_TOKEN=your_api_token_here

What This Example Does

This example:

Tests the Llama 4 Scout model on various analytical and creative tasks
Demonstrates the model's advanced reasoning capabilities
Compares Llama 4 Scout with Llama 3 to show improvements
Shows how to configure Replicate model parameters for optimal results

Running the Example

Set your Replicate API token (see above)
Run the evaluation:

bash

promptfoo eval

View the results:

bash

promptfoo view

Model Configuration

The example demonstrates key Replicate configuration options for Llama 4:

temperature: Controls randomness (0.0 = deterministic, 1.0 = very random)
max_tokens: Maximum number of tokens to generate
top_p: Nucleus sampling threshold for token selection

Test Cases

The example includes tests for:

AI and mixture-of-experts architecture - Testing the model's self-awareness
Multimodal AI - Exploring the model's understanding of multimodal capabilities
Quantum computing - Complex technical topics
Climate solutions - Practical problem-solving
Creative writing - Narrative and storytelling abilities

Customizing the Example

You can modify this example to:

Test Llama 4 Maverick (128 experts) when available
Add image understanding tests (when multimodal features are enabled)
Compare against other state-of-the-art models
Explore the mixture-of-experts architecture's impact on different tasks

Notes

Llama 4 Scout uses a mixture-of-experts approach for efficient computation
The model excels at both analytical and creative tasks
Response quality benefits from the 16-expert architecture
Part of the Llama 4 ecosystem with multimodal capabilities