Back to Llama Index

Groq

docs/examples/llm/groq.ipynb

0.14.212.7 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/groq.ipynb" target="_parent"></a>

Groq

Welcome to Groq! ๐Ÿš€ At Groq, we've developed the world's first Language Processing Unitโ„ข, or LPU. The Groq LPU has a deterministic, single core streaming architecture that sets the standard for GenAI inference speed with predictable and repeatable performance for any given workload.

Beyond the architecture, our software is designed to empower developers like you with the tools you need to create innovative, powerful AI applications. With Groq as your engine, you can:

  • Achieve uncompromised low latency and performance for real-time AI and HPC inferences ๐Ÿ”ฅ
  • Know the exact performance and compute time for any given workload ๐Ÿ”ฎ
  • Take advantage of our cutting-edge technology to stay ahead of the competition ๐Ÿ’ช

Want more Groq? Check out our website for more resources and join our Discord community to connect with our developers!

Setup

If you're opening this Notebook on colab, you will probably need to install LlamaIndex ๐Ÿฆ™.

python
% pip install llama-index-llms-groq
python
!pip install llama-index
python
from llama_index.llms.groq import Groq

Create an API key at the Groq console, then set it to the environment variable GROQ_API_KEY.

bash
export GROQ_API_KEY=<your api key>

Alternatively, you can pass your API key to the LLM when you init it:

python
llm = Groq(model="llama3-70b-8192", api_key="your_api_key")

A list of available LLM models can be found here.

python
response = llm.complete("Explain the importance of low latency LLMs")
python
print(response)

Call chat with a list of messages

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
python
print(resp)

Streaming

Using stream_complete endpoint

python
response = llm.stream_complete("Explain the importance of low latency LLMs")
python
for r in response:
    print(r.delta, end="")

Using stream_chat endpoint

python
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
python
for r in resp:
    print(r.delta, end="")