docs/examples/output_parsing/openai_pydantic_program.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/output_parsing/openai_pydantic_program.ipynb" target="_parent"></a>
This guide shows you how to generate structured data with new OpenAI API via LlamaIndex. The user just needs to specify a Pydantic object.
We demonstrate two settings:
Album object (which can contain a list of Song objects)DirectoryTree object (which can contain recursive Node objects)AlbumThis is a simple example of parsing an output into an Album schema, which can contain multiple songs.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex š¦.
%pip install llama-index-llms-openai
%pip install llama-index-program-openai
%pip install llama-index
from pydantic import BaseModel
from typing import List
from llama_index.program.openai import OpenAIPydanticProgram
Define output schema (without docstring)
class Song(BaseModel):
title: str
length_seconds: int
class Album(BaseModel):
name: str
artist: str
songs: List[Song]
Define openai pydantic program
prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = OpenAIPydanticProgram.from_defaults(
output_cls=Album, prompt_template_str=prompt_template_str, verbose=True
)
Run program to get structured output.
output = program(
movie_name="The Shining", description="Data model for an album."
)
class Song(BaseModel):
"""Data model for a song."""
title: str
length_seconds: int
class Album(BaseModel):
"""Data model for an album."""
name: str
artist: str
songs: List[Song]
prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = OpenAIPydanticProgram.from_defaults(
output_cls=Album, prompt_template_str=prompt_template_str, verbose=True
)
Run program to get structured output.
output = program(movie_name="The Shining")
The output is a valid Pydantic object that we can then use to call functions/APIs.
output
Instead of waiting for the Function Call to generate the entire JSON, we can use the stream_partial_objects() method of the program to stream valid intermediate instances of the Pydantic Output class as soon as they're available š„
First let's define the Output Pydantic class
from pydantic import BaseModel, Field
class CharacterInfo(BaseModel):
"""Information about a character."""
character_name: str
name: str = Field(..., description="Name of the actor/actress")
hometown: str
class Characters(BaseModel):
"""List of characters."""
characters: list[CharacterInfo] = Field(default_factory=list)
Now we'll initialilze the program with prompt template
from llama_index.program.openai import OpenAIPydanticProgram
prompt_template_str = "Information about 3 characters from the movie: {movie}"
program = OpenAIPydanticProgram.from_defaults(
output_cls=Characters, prompt_template_str=prompt_template_str
)
Finally we stream the partial objects using the stream_partial_objects() method
for partial_object in program.stream_partial_objects(movie="Harry Potter"):
# send the partial object to the frontend for better user experience
print(partial_object)
Album (with Parallel Function Calling)With the latest parallel function calling feature from OpenAI, we can simultaneously extract multiple structured data from a single prompt!
To do this, we need to:
gpt-3.5-turbo-1106), andallow_multiple to True in our OpenAIPydanticProgram (if not, it will only return the first object, and raise a warning).from llama_index.llms.openai import OpenAI
prompt_template_str = """\
Generate 4 albums about spring, summer, fall, and winter.
"""
program = OpenAIPydanticProgram.from_defaults(
output_cls=Album,
llm=OpenAI(model="gpt-3.5-turbo-1106"),
prompt_template_str=prompt_template_str,
allow_multiple=True,
verbose=True,
)
output = program()
The output is a list of valid Pydantic object.
output
Album (Streaming)We also support streaming a list of objects through our stream_list function.
Full credits to this idea go to openai_function_call repo: https://github.com/jxnl/openai_function_call/tree/main/examples/streaming_multitask
prompt_template_str = "{input_str}"
program = OpenAIPydanticProgram.from_defaults(
output_cls=Album,
prompt_template_str=prompt_template_str,
verbose=False,
)
output = program.stream_list(
input_str="make up 5 random albums",
)
for obj in output:
print(obj.json(indent=2))
DirectoryTree objectThis is directly inspired by jxnl's awesome repo here: https://github.com/jxnl/openai_function_call.
That repository shows how you can use OpenAI's function API to parse recursive Pydantic objects. The main requirement is that you want to "wrap" a recursive Pydantic object with a non-recursive one.
Here we show an example in a "directory" setting, where a DirectoryTree object wraps recursive Node objects, to parse a file structure.
# NOTE: defining recursive objects in a notebook causes errors
from directory import DirectoryTree, Node
DirectoryTree.schema()
program = OpenAIPydanticProgram.from_defaults(
output_cls=DirectoryTree,
prompt_template_str="{input_str}",
verbose=True,
)
input_str = """
root
āāā folder1
ā āāā file1.txt
ā āāā file2.txt
āāā folder2
āāā file3.txt
āāā subfolder1
āāā file4.txt
"""
output = program(input_str=input_str)
The output is a full DirectoryTree structure with recursive Node objects.
output