Groq

Welcome to Groq! 🚀 At Groq, we've developed the world's first Language Processing Unit™, or LPU. The Groq LPU has a deterministic, single core streaming architecture that sets the standard for GenAI inference speed with predictable and repeatable performance for any given workload.

Beyond the architecture, our software is designed to empower developers like you with the tools you need to create innovative, powerful AI applications. With Groq as your engine, you can:

Achieve uncompromised low latency and performance for real-time AI and HPC inferences 🔥
Know the exact performance and compute time for any given workload 🔮
Take advantage of our cutting-edge technology to stay ahead of the competition 💪

Want more Groq? Check out our website for more resources and join our Discord community to connect with our developers!

Setup

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

% pip install llama-index-llms-groq

python

!pip install llama-index

python

from llama_index.llms.groq import Groq

Create an API key at the Groq console, then set it to the environment variable GROQ_API_KEY.

bash

export GROQ_API_KEY=<your api key>

Alternatively, you can pass your API key to the LLM when you init it:

python

llm = Groq(model="llama3-70b-8192", api_key="your_api_key")

A list of available LLM models can be found here.

python

response = llm.complete("Explain the importance of low latency LLMs")

python

print(response)

Call `chat` with a list of messages

python

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

python

print(resp)

Streaming

Using stream_complete endpoint

python

response = llm.stream_complete("Explain the importance of low latency LLMs")

python

for r in response:
    print(r.delta, end="")

Using stream_chat endpoint

python

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

python

for r in resp:
    print(r.delta, end="")

Groq

Groq

Setup

Call chat with a list of messages

Streaming

Call `chat` with a list of messages