Groq

Groq is an extremely fast inference API compatible with all the options provided by Promptfoo's OpenAI provider. See openai specific documentation for configuration details.

Groq provides access to a wide range of models including reasoning models with chain-of-thought capabilities, compound models with built-in tools, and standard chat models. See the Groq Models documentation for the current list of available models.

Quick Reference

Feature	Description	Provider Prefix	Key Config
Reasoning Models	Models with chain-of-thought capabilities	`groq:`	`include_reasoning`
Compound Models	Built-in code execution, web search, browsing	`groq:`	`compound_custom`
Standard Models	General-purpose chat models	`groq:`	`temperature`
Long Context	Models with extended context windows (100k+)	`groq:`	N/A
Responses API	Structured API with simplified reasoning control	`groq:responses:`	`reasoning.effort`

Key Differences:

groq: - Standard Chat Completions API with granular reasoning control
groq:responses: - Responses API (beta) with simplified reasoning.effort parameter
Compound models - Have automatic code execution, web search, and visit website tools
Reasoning models - Support browser_search tool via manual configuration
Explicit control - Use compound_custom.tools.enabled_tools to control which built-in tools are enabled

Setup

To use Groq, you need to set up your API key:

Create a Groq API key in the Groq Console.
Set the GROQ_API_KEY environment variable:

export GROQ_API_KEY=your_api_key_here

Alternatively, you can specify the apiKey in the provider configuration (see below).

Configuration

Configure the Groq provider in your promptfoo configuration file:

yaml

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
  - id: groq:llama-3.3-70b-versatile
    config:
      temperature: 0.7
      max_completion_tokens: 100
prompts:
  - Write a funny tweet about {{topic}}
tests:
  - vars:
      topic: cats
  - vars:
      topic: dogs

Key configuration options:

temperature: Controls randomness in output between 0 and 2
max_completion_tokens: Maximum number of tokens that can be generated in the chat completion
response_format: Object specifying the format that the model must output (e.g. JSON mode)
presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far
seed: For deterministic sampling (best effort)
frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far
parallel_tool_calls: Whether to enable parallel function calling during tool use (default: true)
reasoning_format: For reasoning models, controls how reasoning is presented. Options: 'parsed' (separate field), 'raw' (with think tags), 'hidden' (no reasoning shown). Note: parsed or hidden required when using JSON mode or tool calls.
include_reasoning: For GPT-OSS models, set to false to hide reasoning output (default: true)
reasoning_effort: For reasoning models, controls the level of reasoning effort. Options: 'low', 'medium', 'high' for GPT-OSS models; 'none', 'default' for Qwen models
stop: Up to 4 sequences where the API will stop generating further tokens
tool_choice: Controls tool usage ('none', 'auto', 'required', or specific tool)
tools: List of tools (functions) the model may call (max 128)
top_p: Alternative to temperature sampling using nucleus sampling

Supported Models

Groq provides access to models across several categories. For the current list of available models and their specifications, see the Groq Models documentation.

Model Categories

Reasoning Models - Models with chain-of-thought capabilities (e.g., GPT-OSS, Qwen, DeepSeek R1 variants)
Compound Models - Models with built-in tools for code execution and web search (groq/compound)
Standard Chat Models - General-purpose models (e.g., Llama variants)
Long Context Models - Models with extended context windows (100k+ tokens)
Vision Models - Multi-modal models that can process images
Speech Models - Whisper models for speech-to-text

Using Groq Models

Use any model from Groq's model library with the groq: prefix:

yaml

providers:
  # Standard chat model
  - id: groq:llama-3.3-70b-versatile
    config:
      temperature: 0.7
      max_completion_tokens: 4096

  # Reasoning model
  - id: groq:deepseek-r1-distill-llama-70b
    config:
      temperature: 0.6
      include_reasoning: true

Check the Groq Console for the full list of available models.

Tool Use (Function Calling)

Groq supports tool use, allowing models to call predefined functions. Configure tools in your provider settings:

yaml

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
  - id: groq:llama-3.3-70b-versatile
    config:
      tools:
        - type: function
          function:
            name: get_weather
            description: 'Get the current weather in a given location'
            parameters:
              type: object
              properties:
                location:
                  type: string
                  description: 'The city and state, e.g. San Francisco, CA'
                unit:
                  type: string
                  enum:
                    - celsius
                    - fahrenheit
              required:
                - location
      tool_choice: auto

Vision

Groq provides vision models that can process both text and image inputs. These models support tool use and JSON mode. See the Groq Vision documentation for current model availability and specifications.

Image Input Guidelines

Image URLs: Maximum allowed size is 20MB
Base64 Encoded Images: Maximum allowed size is 4MB
Multiple Images: Check model documentation for image limits per request

How to Use Vision in Promptfoo

Specify a vision model ID in your provider configuration and include images in OpenAI-compatible format:

yaml

- role: user
  content:
    - type: text
      text: '{{question}}'
    - type: image_url
      image_url:
        url: '{{url}}'

yaml

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts: file://openai-compatible-prompt-format.yaml
providers:
  - id: groq:meta-llama/llama-4-scout-17b-16e-instruct
    config:
      temperature: 1
      max_completion_tokens: 1024
tests:
  - vars:
      question: 'What do you see in the image?'
      url: https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Felis_catus-cat_on_snow.jpg/1024px-Felis_catus-cat_on_snow.jpg
    assert:
      - type: contains
        value: 'cat'

Reasoning

Groq provides access to reasoning models that excel at complex problem-solving tasks requiring step-by-step analysis. These include GPT-OSS variants, Qwen models, and DeepSeek R1 variants. Check the Groq Models documentation for current reasoning model availability.

yaml

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Groq reasoning model example
prompts:
  - |
    Your task is to analyze the following question with careful reasoning and rigor:
    {{ question }}
providers:
  - id: groq:deepseek-r1-distill-llama-70b
    config:
      temperature: 0.6
      max_completion_tokens: 25000
      reasoning_format: parsed # 'parsed', 'raw', or 'hidden'
tests:
  - vars:
      question: |
        Solve for x in the following equation: e^-x = x^3 - 3x^2 + 2x + 5
    assert:
      - type: javascript
        value: output.includes('0.676') || output.includes('.676')

Controlling Reasoning Output

For GPT-OSS models, use the include_reasoning parameter:

Parameter Value	Description
`true` (default)	Shows reasoning/thinking process in output
`false`	Hides reasoning, returns only final answer

Example to hide reasoning:

yaml

providers:
  - id: groq:deepseek-r1-distill-llama-70b
    config:
      reasoning_format: hidden # Hide thinking output

For other reasoning models (e.g., Qwen, DeepSeek), use reasoning_format:

Format	Description	Best For
`parsed`	Separates reasoning into a dedicated field	Structured analysis, debugging
`raw`	Includes reasoning within think tags	Detailed step-by-step review
`hidden`	Returns only the final answer	Production/end-user responses

Note: When using JSON mode or tool calls with reasoning_format, only parsed or hidden formats are supported.

Assistant Message Prefilling

Control model output format by prefilling assistant messages. This technique allows you to direct the model to skip preambles and enforce specific formats like JSON or code blocks.

How It Works

Include a partial assistant message in your prompt, and the model will continue from that point:

yaml

prompts:
  - |
    [
      {
        "role": "user",
        "content": "{{task}}"
      },
      {
        "role": "assistant",
        "content": "{{prefill}}"
      }
    ]

providers:
  - id: groq:llama-3.3-70b-versatile
    config:
      stop: '```' # Stop at closing code fence

tests:
  - vars:
      task: Write a Python function to calculate factorial
      prefill: '```python'

Common Use Cases

Generate concise code:

yaml

prefill: '```python'

Extract structured data:

yaml

prefill: '```json'

Skip introductions:

yaml

prefill: "Here's the answer: "

Combine with the stop parameter for precise output control.

Responses API

Groq's Responses API provides a structured approach to conversational AI, with built-in support for tools, structured outputs, and reasoning. Use the groq:responses: prefix to access this API. Note: This API is currently in beta.

Basic Usage

yaml

providers:
  - id: groq:responses:llama-3.3-70b-versatile
    config:
      temperature: 0.6
      max_output_tokens: 1000
      reasoning:
        effort: 'high' # 'low', 'medium', or 'high'

Structured Outputs

The Responses API makes it easy to get structured JSON outputs:

yaml

providers:
  - id: groq:responses:llama-3.3-70b-versatile
    config:
      response_format:
        type: 'json_schema'
        json_schema:
          name: 'calculation_result'
          strict: true
          schema:
            type: 'object'
            properties:
              result:
                type: 'number'
              explanation:
                type: 'string'
            required: ['result', 'explanation']
            additionalProperties: false

Input Format

The Responses API accepts either a simple string or an array of message objects:

yaml

prompts:
  # Simple string input
  - 'What is the capital of France?'

  # Or message array (as JSON)
  - |
    [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is the capital of France?"}
    ]

Key Differences from Chat Completions API

Feature	Chat Completions (`groq:`)	Responses API (`groq:responses:`)
Endpoint	`/v1/chat/completions`	`/v1/responses`
Reasoning Control	`include_reasoning`, `reasoning_format`	`reasoning.effort`
Token Limit Param	`max_completion_tokens`	`max_output_tokens`
Input Field	`messages`	`input`
Output Field	`choices[0].message.content`	`output_text`

For more details on the Responses API, see Groq's Responses API documentation.

Built-in Tools

Groq offers models with built-in tools: compound models with automatic tool usage, and reasoning models with manually configured tools like browser search.

Compound Models (Automatic Tools)

Groq's compound models combine language models with pre-enabled built-in tools that activate automatically based on the task. Check the Groq documentation for current compound model availability.

Built-in Capabilities (No Configuration Needed):

Code Execution - Python code execution for calculations and algorithms
Web Search - Real-time web searches for current information
Visit Website - Automatic webpage fetching when URLs are in the message

Basic Configuration:

yaml

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
  - id: groq:groq/compound
    config:
      temperature: 0.7
      max_completion_tokens: 3000

prompts:
  - |
    {{task}}

tests:
  # Code execution
  - vars:
      task: Calculate the first 10 Fibonacci numbers using code
    assert:
      - type: javascript
        value: output.length > 50

  # Web search
  - vars:
      task: What is the current population of Seattle?
    assert:
      - type: javascript
        value: output.length > 50

Example Outputs:

Code execution:

Thinking:
To calculate the first 10 Fibonacci numbers, I will use a Python code snippet.

<tool>
python
def fibonacci(n):
    fib = [0, 1]
    for i in range(2, n):
        fib.append(fib[i-1] + fib[i-2])
    return fib[:n]

print(fibonacci(10))
</tool>

<output>[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]</output>

Web search:

<tool>search(current population of Seattle)</tool>

<output>
Title: Seattle Population 2025
URL: https://example.com/seattle
Content: The current metro area population of Seattle in 2025 is 816,600...
</output>

Web Search Settings (Optional):

You can customize web search behavior:

yaml

providers:
  - id: groq:groq/compound
    config:
      search_settings:
        exclude_domains: ['example.com'] # Exclude specific domains
        include_domains: ['*.edu'] # Restrict to specific domains
        country: 'us' # Boost results from country

Explicit Tool Control:

By default, Compound models automatically select which tools to use. You can explicitly control which tools are available using compound_custom:

yaml

providers:
  - id: groq:groq/compound
    config:
      compound_custom:
        tools:
          enabled_tools:
            - code_interpreter # Python code execution
            - web_search # Web searches
            - visit_website # URL fetching

This allows you to:

Restrict which tools are available for a request
Control costs by limiting tool usage
Ensure only specific capabilities are used

Available Tool Identifiers:

code_interpreter - Python code execution
web_search - Real-time web searches
visit_website - Webpage fetching
browser_automation - Interactive browser control (requires latest version)
wolfram_alpha - Computational knowledge (requires API key)

Reasoning Models with Browser Search

Some reasoning models on Groq support a browser search tool that must be explicitly enabled. Check the Groq documentation for which models support this feature.

Configuration:

yaml

# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
  - id: groq:compound-beta # or other models with browser_search support
    config:
      temperature: 0.6
      max_completion_tokens: 3000
      tools:
        - type: browser_search
      tool_choice: required # Ensures the tool is used

prompts:
  - |
    {{question}}

tests:
  - vars:
      question: What is the current population of Seattle?
    assert:
      - type: javascript
        value: output.length > 50

How It Works:

Browser search navigates websites interactively, providing detailed results with automatic citations. The model will search, read pages, and cite sources in its response.

Key Differences from Web Search:

Browser Search (Reasoning models): Mimics human browsing, navigates websites interactively, provides detailed content
Web Search (Compound models): Performs single search, retrieves text snippets, faster for simple queries

Use Cases

Code Execution (Compound Models):

Mathematical calculations and equation solving
Data analysis and statistical computations
Algorithm implementation and testing
Unit conversions and numerical operations

Web/Browser Search:

Current events and real-time information
Factual queries requiring up-to-date data
Research on recent developments
Population statistics, weather, stock prices

Combined Capabilities (Compound Models):

Financial analysis requiring both research and calculations
Scientific research with computational verification
Data-driven reports combining current information and analysis

Best Practices

Model Selection:
- Use compound models for tasks combining code and research
- Use reasoning models with browser search for detailed web research
- Consider token costs when choosing reasoning_effort levels
Token Limits: Built-in tools consume significant tokens. Set max_completion_tokens to 3000-4000 for complex tasks
Temperature Settings:
- Use 0.3-0.6 for factual research and precise calculations
- Use 0.7-0.9 for creative tasks
Tool Choice:
- Use required to ensure browser search is always used
- Compound models handle tool selection automatically
Error Handling: Tool calls may fail due to network issues. Models typically acknowledge failures and try alternative approaches

Additional Resources

Groq Models Documentation - Current model list and specifications
Groq API Documentation - Full API reference
Groq Console - API key management and usage