llama-index-integrations/llms/llama-index-llms-helicone/README.md
To install the required packages, run:
pip install llama-index-llms-helicone
pip install llama-index
Set your Helicone API key via HELICONE_API_KEY (or pass directly). No provider API keys are needed when using the Helicone AI Gateway.
from llama_index.llms.helicone import Helicone
from llama_index.core.llms import ChatMessage
llm = Helicone(
api_key="<helicone-api-key>", # or set HELICONE_API_KEY env var
model="gpt-4o-mini", # works across providers via gateway
)
You can generate a chat response by sending a list of ChatMessage instances:
message = ChatMessage(role="user", content="Tell me a joke")
resp = llm.chat([message])
print(resp)
To stream responses, use the stream_chat method:
message = ChatMessage(role="user", content="Tell me a story in 250 words")
resp = llm.stream_chat([message])
for r in resp:
print(r.delta, end="")
You can also generate completions with a prompt using the complete method:
resp = llm.complete("Tell me a joke")
print(resp)
To stream completions, use the stream_complete method:
resp = llm.stream_complete("Tell me a story in 250 words")
for r in resp:
print(r.delta, end="")
To use a specific model, you can specify it during initialization. For example, to use Mistral's Mixtral model, you can set it like this:
from llama_index.llms.helicone import Helicone
llm = Helicone(model="gpt-4o-mini")
resp = llm.complete("Write a story about a dragon who can code in Rust")
print(resp)
https://ai-gateway.helicone.ai/v1. Override with api_base or HELICONE_API_BASE if needed.HELICONE_API_KEY is required. The gateway routes to the correct provider based on the model string.