Back to Llama Index

Helicone AI Gateway

docs/examples/llm/helicone.ipynb

0.14.212.3 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/helicone.ipynb" target="_parent"></a>

Helicone AI Gateway

Helicone is an OpenAI-compatible AI Gateway that routes requests to many providers with observability, control, and caching. Learn more on the Helicone docs and see available models.

If you're opening this Notebook on Colab, you'll likely need to install the integration packages below.

Notes:

  • Only your Helicone API key is required (HELICONE_API_KEY); no provider keys are needed.
  • Default base URL is https://ai-gateway.helicone.ai/v1. Override with api_base or HELICONE_API_BASE.
python
%pip install llama-index-llms-helicone
python
!pip install llama-index
python
from llama_index.llms.helicone import Helicone
from llama_index.core.llms import ChatMessage

Call chat with ChatMessage List

You need to either set env var HELICONE_API_KEY or pass api_key in the constructor.

python
# import os
# os.environ["HELICONE_API_KEY"] = "<your-helicone-api-key>"

llm = Helicone(
    api_key="<your-helicone-api-key>",  # or set HELICONE_API_KEY
    model="gpt-4o-mini",  # routed via the Helicone AI Gateway
    max_tokens=256,
)
python
message = ChatMessage(role="user", content="Tell me a joke")
resp = llm.chat([message])
print(resp)

Streaming

python
message = ChatMessage(role="user", content="Tell me a story in 200 words")
resp = llm.stream_chat([message])
for r in resp:
    print(r.delta, end="")

API Support (Chat only; no legacy Completions)

Helicone supports OpenAI-compatible Chat Completions and the newer Responses API. The legacy Completions API is not supported.

In LlamaIndex, use llm.chat(...) and llm.stream_chat(...).

Model Configuration

python
# Choose any OpenAI-compatible model routed by Helicone.
# See https://www.helicone.ai/models for options.

# If HELICONE_API_KEY is set in your environment, you can omit api_key here.
llm = Helicone(model="gpt-4o-mini")
message = ChatMessage(
    role="user", content="Write one sentence about Rust dragons coding."
)
resp = llm.chat([message])
print(resp)