OpenAI

This notebook shows how to use the OpenAI LLM.

If you are looking to integrate with an OpenAI-Compatible API that is not the official OpenAI API, please see the OpenAI-Compatible LLMs integration.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

%pip install llama-index llama-index-llms-openai

Basic Usage

python

import os

os.environ["OPENAI_API_KEY"] = "sk-..."

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="gpt-4o-mini",
    # api_key="some key",  # uses OPENAI_API_KEY env var by default
)

Call `complete` with a prompt

python

from llama_index.llms.openai import OpenAI

resp = llm.complete("Paul Graham is ")

python

print(resp)

Call `chat` with a list of messages

python

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

python

print(resp)

Streaming

Using stream_complete endpoint

python

resp = llm.stream_complete("Paul Graham is ")

python

for r in resp:
    print(r.delta, end="")

Using stream_chat endpoint

python

from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

python

for r in resp:
    print(r.delta, end="")

Configure Model

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")

python

resp = llm.complete("Paul Graham is ")

python

print(resp)

python

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

python

print(resp)

Image Support

OpenAI has support for images in the input of chat messages for many models.

Using the content blocks feature of chat messages, you can easily combone text and images in a single LLM prompt.

python

!wget https://cdn.pixabay.com/photo/2016/07/07/16/46/dice-1502706_640.jpg -O image.png

python

from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")

messages = [
    ChatMessage(
        role="user",
        blocks=[
            ImageBlock(path="image.png"),
            TextBlock(text="Describe the image in a few sentences."),
        ],
    )
]

resp = llm.chat(messages)
print(resp.message.content)

Audio Support

OpenAI has beta support for audio inputs and outputs, using their audio-preview models.

When using these models, you can configure the output modality (text or audio) using the modalities parameter. The output audio configuration can also be set using the audio_config parameter. See the OpenAI docs for more information.

python

from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAI


llm = OpenAI(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio_config={"voice": "alloy", "format": "wav"},
)

messages = [
    ChatMessage(role="user", content="Hello! My name is Logan."),
]

resp = llm.chat(messages)

python

import base64
from IPython.display import Audio

Audio(base64.b64decode(resp.message.blocks[0].audio), rate=16000)

python

# Add the response to the chat history and ask for the user's name
messages.append(resp.message)
messages.append(ChatMessage(role="user", content="What is my name?"))

resp = llm.chat(messages)
Audio(base64.b64decode(resp.message.blocks[0].audio), rate=16000)

We can also use audio as input and get descriptions or transcriptions of the audio.

python

!wget AUDIO_URL = "https://science.nasa.gov/wp-content/uploads/2024/04/sounds-of-mars-one-small-step-earth.wav" -O audio.wav

python

from llama_index.core.llms import ChatMessage, AudioBlock, TextBlock

messages = [
    ChatMessage(
        role="user",
        blocks=[
            AudioBlock(path="audio.wav", format="wav"),
            TextBlock(
                text="Describe the audio in a few sentences. What is it from?"
            ),
        ],
    )
]

llm = OpenAI(
    model="gpt-4o-audio-preview",
    modalities=["text"],
)

resp = llm.chat(messages)
print(resp)

Using Function/Tool Calling

OpenAI models have native support for function calling. This conveniently integrates with LlamaIndex tool abstractions, letting you plug in any arbitrary Python function to the LLM.

In the example below, we define a function to generate a Song object.

python

from pydantic import BaseModel
from llama_index.core.tools import FunctionTool


class Song(BaseModel):
    """A song with name and artist"""

    name: str
    artist: str


def generate_song(name: str, artist: str) -> Song:
    """Generates a song with provided name and artist."""
    return Song(name=name, artist=artist)


tool = FunctionTool.from_defaults(fn=generate_song)

The strict parameter tells OpenAI whether or not to use constrained sampling when generating tool calls/structured outputs. This means that the generated tool call schema will always contain the expected fields.

Since this seems to increase latency, it defaults to false.

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o-mini", strict=True)
response = llm.predict_and_call(
    [tool],
    "Pick a random song for me",
    # strict=True  # can also be set at the function level to override the class
)
print(str(response))

We can also do multiple function calling.

python

llm = OpenAI(model="gpt-3.5-turbo")
response = llm.predict_and_call(
    [tool],
    "Generate five songs from the Beatles",
    allow_parallel_tool_calls=True,
)
for s in response.sources:
    print(f"Name: {s.tool_name}, Input: {s.raw_input}, Output: {str(s)}")

Manual Tool Calling

If you want to control how a tool is called, you can also split the tool calling and tool selection into their own steps.

First, lets select a tool.

python

from llama_index.core.llms import ChatMessage

chat_history = [ChatMessage(role="user", content="Pick a random song for me")]

resp = llm.chat_with_tools([tool], chat_history=chat_history)

Now, lets call the tool the LLM selected (if any).

If there was a tool call, we should send the results to the LLM to generate the final response (or another tool call!).

python

tools_by_name = {t.metadata.name: t for t in [tool]}
tool_calls = llm.get_tool_calls_from_response(
    resp, error_on_no_tool_call=False
)

while tool_calls:
    # add the LLM's response to the chat history
    chat_history.append(resp.message)

    for tool_call in tool_calls:
        tool_name = tool_call.tool_name
        tool_kwargs = tool_call.tool_kwargs

        print(f"Calling {tool_name} with {tool_kwargs}")
        tool_output = tool(**tool_kwargs)
        chat_history.append(
            ChatMessage(
                role="tool",
                content=str(tool_output),
                # most LLMs like OpenAI need to know the tool call id
                additional_kwargs={"tool_call_id": tool_call.tool_id},
            )
        )

        resp = llm.chat_with_tools([tool], chat_history=chat_history)
        tool_calls = llm.get_tool_calls_from_response(
            resp, error_on_no_tool_call=False
        )

Now, we should have a final response!

python

print(resp.message.content)

Structured Prediction

An important use case for function calling is extracting structured objects. LlamaIndex provides an intuitive interface for converting any LLM into a structured LLM - simply define the target Pydantic class (can be nested), and given a prompt, we extract out the desired object.

python

from llama_index.llms.openai import OpenAI
from llama_index.core.prompts import PromptTemplate
from pydantic import BaseModel
from typing import List


class MenuItem(BaseModel):
    """A menu item in a restaurant."""

    course_name: str
    is_vegetarian: bool


class Restaurant(BaseModel):
    """A restaurant with name, city, and cuisine."""

    name: str
    city: str
    cuisine: str
    menu_items: List[MenuItem]


llm = OpenAI(model="gpt-3.5-turbo")
prompt_tmpl = PromptTemplate(
    "Generate a restaurant in a given city {city_name}"
)
# Option 1: Use `as_structured_llm`
restaurant_obj = (
    llm.as_structured_llm(Restaurant)
    .complete(prompt_tmpl.format(city_name="Dallas"))
    .raw
)
# Option 2: Use `structured_predict`
# restaurant_obj = llm.structured_predict(Restaurant, prompt_tmpl, city_name="Miami")

python

restaurant_obj

Structured Prediction with Streaming

Any LLM wrapped with as_structured_llm supports streaming through stream_chat.

python

from llama_index.core.llms import ChatMessage
from IPython.display import clear_output
from pprint import pprint

input_msg = ChatMessage.from_str("Generate a restaurant in Boston")

sllm = llm.as_structured_llm(Restaurant)
stream_output = sllm.stream_chat([input_msg])
for partial_output in stream_output:
    clear_output(wait=True)
    pprint(partial_output.raw.dict())
    restaurant_obj = partial_output.raw

restaurant_obj

Async

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo")

python

resp = await llm.acomplete("Paul Graham is ")

python

print(resp)

python

resp = await llm.astream_complete("Paul Graham is ")

python

async for delta in resp:
    print(delta.delta, end="")

Async function calling is also supported.

python

llm = OpenAI(model="gpt-3.5-turbo")
response = await llm.apredict_and_call([tool], "Generate a song")
print(str(response))

Set API Key at a per-instance level

If desired, you can have separate LLM instances use separate API keys.

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", api_key="BAD_KEY")
resp = llm.complete("Paul Graham is ")
print(resp)

Additional kwargs

Rather than adding same parameters to each chat or completion call, you can set them at a per-instance level with additional_kwargs.

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", additional_kwargs={"user": "your_user_id"})
resp = llm.complete("Paul Graham is ")
print(resp)

python

from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", additional_kwargs={"user": "your_user_id"})
messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

RAG with LlamaCloud

LlamaCloud is our cloud-based service that allows you to upload, parse, and index documents, and then search them using LlamaIndex. LlamaCloud is currently in a private alpha; please get in touch if you'd like to be considered as a design partner.

Installation

python

%pip install llama-cloud-services

Setup OpenAI and LlamaCloud API Keys

python

import os

os.environ["OPENAI_API_KEY"] = "sk-..."

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

python

from llama_cloud.client import LlamaCloud

client = LlamaCloud(token=os.environ["LLAMA_CLOUD_API_KEY"])

Create a Pipeline.

Pipeline is an empty index on which you can ingest data.

You need to Setup transformation and embedding config which will be used while ingesting the data.

python

# Embedding config
embedding_config = {
    "type": "OPENAI_EMBEDDING",
    "component": {
        "api_key": os.environ["OPENAI_API_KEY"],
        "model_name": "text-embedding-ada-002",  # You can choose any OpenAI Embedding model
    },
}

# Transformation auto config
transform_config = {
    "mode": "auto",
    "config": {
        "chunk_size": 1024,  # editable
        "chunk_overlap": 20,  # editable
    },
}

pipeline = {
    "name": "openai-rag-pipeline",  # Change the name if needed
    "embedding_config": embedding_config,
    "transform_config": transform_config,
    "data_sink_id": None,
}

pipeline = client.pipelines.upsert_pipeline(request=pipeline)

File Upload

We will upload files and add them to the index.

python

with open("../data/10k/uber_2021.pdf", "rb") as f:
    file = client.files.upload_file(upload_file=f)

python

files = [{"file_id": file.id}]

pipeline_files = client.pipelines.add_files_to_pipeline(
    pipeline.id, request=files
)

Check the Ingestion job status

python

jobs = client.pipelines.list_pipeline_jobs(pipeline.id)

jobs[0].status

Connect to Index.

Once the ingestion job is done, head over to your index on the platform and get the necessary details to connect to the index.

python

from llama_cloud_services import LlamaCloudIndex

index = LlamaCloudIndex(
    name="openai-rag-pipeline",
    project_name="Default",
    organization_id="YOUR ORG ID",
    api_key=os.environ["LLAMA_CLOUD_API_KEY"],
)

Test on Sample Query

python

query = "What is the revenue of Uber in 2021?"

Retriever

Here we use hybrid search and re-ranker (cohere re-ranker by default).

python

retriever = index.as_retriever(
    dense_similarity_top_k=3,
    sparse_similarity_top_k=3,
    alpha=0.5,
    enable_reranking=True,
)

retrieved_nodes = retriever.retrieve(query)

Display the retrieved nodes

python

from llama_index.core.response.notebook_utils import display_source_node

for retrieved_node in retrieved_nodes:
    display_source_node(retrieved_node, source_length=1000)

Query Engine

QueryEngine to setup entire RAG workflow.

python

query_engine = index.as_query_engine(
    dense_similarity_top_k=3,
    sparse_similarity_top_k=3,
    alpha=0.5,
    enable_reranking=True,
)

Response

python

response = query_engine.query(query)

print(response)

OpenAI

OpenAI

Basic Usage

Call complete with a prompt

Call chat with a list of messages

Streaming

Configure Model

Image Support

Audio Support

Using Function/Tool Calling

Manual Tool Calling

Structured Prediction

Structured Prediction with Streaming

Async

Set API Key at a per-instance level

Additional kwargs

RAG with LlamaCloud

Installation

Setup OpenAI and LlamaCloud API Keys

Create a Pipeline.

File Upload

Check the Ingestion job status

Connect to Index.

Test on Sample Query

Retriever

Display the retrieved nodes

Query Engine

Response

Call `complete` with a prompt

Call `chat` with a list of messages