site/docs/providers/groq.md
Groq is an extremely fast inference API compatible with all the options provided by Promptfoo's OpenAI provider. See openai specific documentation for configuration details.
Groq provides access to a wide range of models including reasoning models with chain-of-thought capabilities, compound models with built-in tools, and standard chat models. See the Groq Models documentation for the current list of available models.
| Feature | Description | Provider Prefix | Key Config |
|---|---|---|---|
| Reasoning Models | Models with chain-of-thought capabilities | groq: | include_reasoning |
| Compound Models | Built-in code execution, web search, browsing | groq: | compound_custom |
| Standard Models | General-purpose chat models | groq: | temperature |
| Long Context | Models with extended context windows (100k+) | groq: | N/A |
| Responses API | Structured API with simplified reasoning control | groq:responses: | reasoning.effort |
Key Differences:
groq: - Standard Chat Completions API with granular reasoning controlgroq:responses: - Responses API (beta) with simplified reasoning.effort parameterbrowser_search tool via manual configurationcompound_custom.tools.enabled_tools to control which built-in tools are enabledTo use Groq, you need to set up your API key:
GROQ_API_KEY environment variable:export GROQ_API_KEY=your_api_key_here
Alternatively, you can specify the apiKey in the provider configuration (see below).
Configure the Groq provider in your promptfoo configuration file:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: groq:llama-3.3-70b-versatile
config:
temperature: 0.7
max_completion_tokens: 100
prompts:
- Write a funny tweet about {{topic}}
tests:
- vars:
topic: cats
- vars:
topic: dogs
Key configuration options:
temperature: Controls randomness in output between 0 and 2max_completion_tokens: Maximum number of tokens that can be generated in the chat completionresponse_format: Object specifying the format that the model must output (e.g. JSON mode)presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so farseed: For deterministic sampling (best effort)frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so farparallel_tool_calls: Whether to enable parallel function calling during tool use (default: true)reasoning_format: For reasoning models, controls how reasoning is presented. Options: 'parsed' (separate field), 'raw' (with think tags), 'hidden' (no reasoning shown). Note: parsed or hidden required when using JSON mode or tool calls.include_reasoning: For GPT-OSS models, set to false to hide reasoning output (default: true)reasoning_effort: For reasoning models, controls the level of reasoning effort. Options: 'low', 'medium', 'high' for GPT-OSS models; 'none', 'default' for Qwen modelsstop: Up to 4 sequences where the API will stop generating further tokenstool_choice: Controls tool usage ('none', 'auto', 'required', or specific tool)tools: List of tools (functions) the model may call (max 128)top_p: Alternative to temperature sampling using nucleus samplingGroq provides access to models across several categories. For the current list of available models and their specifications, see the Groq Models documentation.
groq/compound)Use any model from Groq's model library with the groq: prefix:
providers:
# Standard chat model
- id: groq:llama-3.3-70b-versatile
config:
temperature: 0.7
max_completion_tokens: 4096
# Reasoning model
- id: groq:deepseek-r1-distill-llama-70b
config:
temperature: 0.6
include_reasoning: true
Check the Groq Console for the full list of available models.
Groq supports tool use, allowing models to call predefined functions. Configure tools in your provider settings:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: groq:llama-3.3-70b-versatile
config:
tools:
- type: function
function:
name: get_weather
description: 'Get the current weather in a given location'
parameters:
type: object
properties:
location:
type: string
description: 'The city and state, e.g. San Francisco, CA'
unit:
type: string
enum:
- celsius
- fahrenheit
required:
- location
tool_choice: auto
Groq provides vision models that can process both text and image inputs. These models support tool use and JSON mode. See the Groq Vision documentation for current model availability and specifications.
Specify a vision model ID in your provider configuration and include images in OpenAI-compatible format:
- role: user
content:
- type: text
text: '{{question}}'
- type: image_url
image_url:
url: '{{url}}'
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
prompts: file://openai-compatible-prompt-format.yaml
providers:
- id: groq:meta-llama/llama-4-scout-17b-16e-instruct
config:
temperature: 1
max_completion_tokens: 1024
tests:
- vars:
question: 'What do you see in the image?'
url: https://upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Felis_catus-cat_on_snow.jpg/1024px-Felis_catus-cat_on_snow.jpg
assert:
- type: contains
value: 'cat'
Groq provides access to reasoning models that excel at complex problem-solving tasks requiring step-by-step analysis. These include GPT-OSS variants, Qwen models, and DeepSeek R1 variants. Check the Groq Models documentation for current reasoning model availability.
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: Groq reasoning model example
prompts:
- |
Your task is to analyze the following question with careful reasoning and rigor:
{{ question }}
providers:
- id: groq:deepseek-r1-distill-llama-70b
config:
temperature: 0.6
max_completion_tokens: 25000
reasoning_format: parsed # 'parsed', 'raw', or 'hidden'
tests:
- vars:
question: |
Solve for x in the following equation: e^-x = x^3 - 3x^2 + 2x + 5
assert:
- type: javascript
value: output.includes('0.676') || output.includes('.676')
For GPT-OSS models, use the include_reasoning parameter:
| Parameter Value | Description |
|---|---|
true (default) | Shows reasoning/thinking process in output |
false | Hides reasoning, returns only final answer |
Example to hide reasoning:
providers:
- id: groq:deepseek-r1-distill-llama-70b
config:
reasoning_format: hidden # Hide thinking output
For other reasoning models (e.g., Qwen, DeepSeek), use reasoning_format:
| Format | Description | Best For |
|---|---|---|
parsed | Separates reasoning into a dedicated field | Structured analysis, debugging |
raw | Includes reasoning within think tags | Detailed step-by-step review |
hidden | Returns only the final answer | Production/end-user responses |
Note: When using JSON mode or tool calls with reasoning_format, only parsed or hidden formats are supported.
Control model output format by prefilling assistant messages. This technique allows you to direct the model to skip preambles and enforce specific formats like JSON or code blocks.
Include a partial assistant message in your prompt, and the model will continue from that point:
prompts:
- |
[
{
"role": "user",
"content": "{{task}}"
},
{
"role": "assistant",
"content": "{{prefill}}"
}
]
providers:
- id: groq:llama-3.3-70b-versatile
config:
stop: '```' # Stop at closing code fence
tests:
- vars:
task: Write a Python function to calculate factorial
prefill: '```python'
Generate concise code:
prefill: '```python'
Extract structured data:
prefill: '```json'
Skip introductions:
prefill: "Here's the answer: "
Combine with the stop parameter for precise output control.
Groq's Responses API provides a structured approach to conversational AI, with built-in support for tools, structured outputs, and reasoning. Use the groq:responses: prefix to access this API. Note: This API is currently in beta.
providers:
- id: groq:responses:llama-3.3-70b-versatile
config:
temperature: 0.6
max_output_tokens: 1000
reasoning:
effort: 'high' # 'low', 'medium', or 'high'
The Responses API makes it easy to get structured JSON outputs:
providers:
- id: groq:responses:llama-3.3-70b-versatile
config:
response_format:
type: 'json_schema'
json_schema:
name: 'calculation_result'
strict: true
schema:
type: 'object'
properties:
result:
type: 'number'
explanation:
type: 'string'
required: ['result', 'explanation']
additionalProperties: false
The Responses API accepts either a simple string or an array of message objects:
prompts:
# Simple string input
- 'What is the capital of France?'
# Or message array (as JSON)
- |
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
| Feature | Chat Completions (groq:) | Responses API (groq:responses:) |
|---|---|---|
| Endpoint | /v1/chat/completions | /v1/responses |
| Reasoning Control | include_reasoning, reasoning_format | reasoning.effort |
| Token Limit Param | max_completion_tokens | max_output_tokens |
| Input Field | messages | input |
| Output Field | choices[0].message.content | output_text |
For more details on the Responses API, see Groq's Responses API documentation.
Groq offers models with built-in tools: compound models with automatic tool usage, and reasoning models with manually configured tools like browser search.
Groq's compound models combine language models with pre-enabled built-in tools that activate automatically based on the task. Check the Groq documentation for current compound model availability.
Built-in Capabilities (No Configuration Needed):
Basic Configuration:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: groq:groq/compound
config:
temperature: 0.7
max_completion_tokens: 3000
prompts:
- |
{{task}}
tests:
# Code execution
- vars:
task: Calculate the first 10 Fibonacci numbers using code
assert:
- type: javascript
value: output.length > 50
# Web search
- vars:
task: What is the current population of Seattle?
assert:
- type: javascript
value: output.length > 50
Example Outputs:
Code execution:
Thinking:
To calculate the first 10 Fibonacci numbers, I will use a Python code snippet.
<tool>
python
def fibonacci(n):
fib = [0, 1]
for i in range(2, n):
fib.append(fib[i-1] + fib[i-2])
return fib[:n]
print(fibonacci(10))
</tool>
<output>[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]</output>
Web search:
<tool>search(current population of Seattle)</tool>
<output>
Title: Seattle Population 2025
URL: https://example.com/seattle
Content: The current metro area population of Seattle in 2025 is 816,600...
</output>
Web Search Settings (Optional):
You can customize web search behavior:
providers:
- id: groq:groq/compound
config:
search_settings:
exclude_domains: ['example.com'] # Exclude specific domains
include_domains: ['*.edu'] # Restrict to specific domains
country: 'us' # Boost results from country
Explicit Tool Control:
By default, Compound models automatically select which tools to use. You can explicitly control which tools are available using compound_custom:
providers:
- id: groq:groq/compound
config:
compound_custom:
tools:
enabled_tools:
- code_interpreter # Python code execution
- web_search # Web searches
- visit_website # URL fetching
This allows you to:
Available Tool Identifiers:
code_interpreter - Python code executionweb_search - Real-time web searchesvisit_website - Webpage fetchingbrowser_automation - Interactive browser control (requires latest version)wolfram_alpha - Computational knowledge (requires API key)Some reasoning models on Groq support a browser search tool that must be explicitly enabled. Check the Groq documentation for which models support this feature.
Configuration:
# yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
providers:
- id: groq:compound-beta # or other models with browser_search support
config:
temperature: 0.6
max_completion_tokens: 3000
tools:
- type: browser_search
tool_choice: required # Ensures the tool is used
prompts:
- |
{{question}}
tests:
- vars:
question: What is the current population of Seattle?
assert:
- type: javascript
value: output.length > 50
How It Works:
Browser search navigates websites interactively, providing detailed results with automatic citations. The model will search, read pages, and cite sources in its response.
Key Differences from Web Search:
Code Execution (Compound Models):
Web/Browser Search:
Combined Capabilities (Compound Models):
Model Selection:
reasoning_effort levelsToken Limits: Built-in tools consume significant tokens. Set max_completion_tokens to 3000-4000 for complex tasks
Temperature Settings:
Tool Choice:
required to ensure browser search is always usedError Handling: Tool calls may fail due to network issues. Models typically acknowledge failures and try alternative approaches