Back to Eliza

Groq Plugin

packages/docs/plugin-registry/llm/groq.md

2.0.12.8 KB
Original Source

The Groq plugin connects Eliza agents to Groq's inference API. Groq's Language Processing Unit (LPU) delivers significantly faster token generation speeds than GPU-based inference — making it ideal for latency-sensitive agent workflows.

Package: @elizaos/plugin-groq

Installation

bash
eliza plugins install @elizaos/plugin-groq

Auto-Enable

The plugin auto-enables when GROQ_API_KEY is present:

bash
export GROQ_API_KEY=gsk_...

Configuration

Environment VariableRequiredDescription
GROQ_API_KEYYesGroq API key from console.groq.com
GROQ_BASE_URLNoCustom base URL for API requests
GROQ_SMALL_MODELNoOverride the small model identifier
GROQ_LARGE_MODELNoOverride the large model identifier
GROQ_TTS_MODELNoOverride the text-to-speech model
GROQ_TTS_VOICENoVoice profile for text-to-speech output
GROQ_TTS_RESPONSE_FORMATNoOutput format for text-to-speech audio

eliza.json Example

json
{
  "auth": {
    "profiles": {
      "default": {
        "provider": "groq",
        "model": "openai/gpt-oss-120b"
      }
    }
  }
}

Supported Models

ModelContextSpeedBest For
openai/gpt-oss-120b128kFastDefault small and large text model

Model Type Mapping

elizaOS Model TypeGroq Model
TEXT_SMALLopenai/gpt-oss-120b
TEXT_LARGEopenai/gpt-oss-120b

Features

  • Ultra-low latency generation (typically 250–800 tokens/second)
  • Streaming responses
  • Tool use / function calling (on select models)
  • Compatible with OpenAI SDK format
  • Free tier available

Performance Characteristics

Groq's LPU architecture excels at:

  • Time to first token: Typically under 200ms
  • Token throughput: 250–800+ tokens/second (model-dependent)
  • Latency consistency: Very low jitter compared to GPU clusters

This makes Groq particularly well-suited for:

  • Real-time chat agents where response latency matters
  • High-frequency autonomous agent loops
  • Applications requiring consistent, predictable latency

Rate Limits

Groq enforces per-minute token limits by model. Free tier limits are lower; paid tiers scale based on usage.

See console.groq.com/docs/rate-limits for current limits.

Pricing

Groq offers a free tier. Paid usage is billed per million tokens.

See groq.com/pricing for current rates.