Back to Promptfoo

provider-replicate/llama4-scout (Replicate Llama 4 Scout)

examples/provider-replicate/llama4-scout/README.md

0.121.92.8 KB
Original Source

provider-replicate/llama4-scout (Replicate Llama 4 Scout)

You can run this example with:

bash
npx promptfoo@latest init --example provider-replicate/llama4-scout
cd provider-replicate/llama4-scout

This example demonstrates how to use Replicate to run the new Llama 4 Scout model, a cutting-edge 17 billion parameter model with 16 experts using mixture-of-experts architecture.

About Llama 4 Scout

Llama 4 Scout is part of the Llama 4 collection of natively multimodal AI models. Key features:

  • 17 billion parameters with 16 experts
  • Mixture-of-experts architecture for enhanced performance
  • Natively multimodal - enables text and multimodal experiences
  • Industry-leading performance in text and image understanding

Environment Variables

This example requires the following environment variable:

You can set this in a .env file or directly in your environment:

bash
export REPLICATE_API_TOKEN=your_api_token_here

What This Example Does

This example:

  • Tests the Llama 4 Scout model on various analytical and creative tasks
  • Demonstrates the model's advanced reasoning capabilities
  • Compares Llama 4 Scout with Llama 3 to show improvements
  • Shows how to configure Replicate model parameters for optimal results

Running the Example

  1. Set your Replicate API token (see above)
  2. Run the evaluation:
bash
promptfoo eval
  1. View the results:
bash
promptfoo view

Model Configuration

The example demonstrates key Replicate configuration options for Llama 4:

  • temperature: Controls randomness (0.0 = deterministic, 1.0 = very random)
  • max_tokens: Maximum number of tokens to generate
  • top_p: Nucleus sampling threshold for token selection

Test Cases

The example includes tests for:

  • AI and mixture-of-experts architecture - Testing the model's self-awareness
  • Multimodal AI - Exploring the model's understanding of multimodal capabilities
  • Quantum computing - Complex technical topics
  • Climate solutions - Practical problem-solving
  • Creative writing - Narrative and storytelling abilities

Customizing the Example

You can modify this example to:

  • Test Llama 4 Maverick (128 experts) when available
  • Add image understanding tests (when multimodal features are enabled)
  • Compare against other state-of-the-art models
  • Explore the mixture-of-experts architecture's impact on different tasks

Notes

  • Llama 4 Scout uses a mixture-of-experts approach for efficient computation
  • The model excels at both analytical and creative tasks
  • Response quality benefits from the 16-expert architecture
  • Part of the Llama 4 ecosystem with multimodal capabilities