Fine Tuning with Function Calling

In this notebook, we walk through how to fine-tune gpt-3.5-turbo with function calls. The primary use case here is structured data extraction. Our main focus is distilling GPT-4 outputs to help improve gpt-3.5-turbo function calling capabilities.

We will walk through some examples, from simple to advanced:

Fine-tuning on some toy messages/structured outputs logged through our OpenAI Pydantic Program object.
Fine-tuning on context-augmented queries/structured outputs over an entire document corpus. Use this in a RAG system.

python

%pip install llama-index-finetuning
%pip install llama-index-llms-openai
%pip install llama-index-finetuning-callbacks
%pip install llama-index-readers-file pymupdf
%pip install llama-index-program-openai

python

import nest_asyncio

nest_asyncio.apply()

python

import os
import openai

python

os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]

Fine-tuning Using GPT-4 Pydantic Programs

In this section we show how to log inputs/outputs through our low-level Pydantic Program module. We use that dataset to fine-tune an LLM.

Defining Pydantic Model + Program

Here, we define the GPT-4 powered function calling program that will generate structured outputs into a Pydantic object (an Album).

python

from llama_index.program.openai import OpenAIPydanticProgram
from pydantic import BaseModel
from llama_index.llms.openai import OpenAI
from llama_index.finetuning.callbacks import OpenAIFineTuningHandler
from llama_index.core.callbacks import CallbackManager
from typing import List


class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]


finetuning_handler = OpenAIFineTuningHandler()
callback_manager = CallbackManager([finetuning_handler])

llm = OpenAI(model="gpt-4", callback_manager=callback_manager)


prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = OpenAIPydanticProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=False,
)

Log Inputs/Outputs

We define some sample movie names as inputs and log the outputs through the function calling program.

python

# NOTE: we need >= 10 movies to use OpenAI fine-tuning
movie_names = [
    "The Shining",
    "The Departed",
    "Titanic",
    "Goodfellas",
    "Pretty Woman",
    "Home Alone",
    "Caged Fury",
    "Edward Scissorhands",
    "Total Recall",
    "Ghost",
    "Tremors",
    "RoboCop",
    "Rocky V",
]

python

from tqdm.notebook import tqdm

for movie_name in tqdm(movie_names):
    output = program(movie_name=movie_name)
    print(output.json())

python

finetuning_handler.save_finetuning_events("mock_finetune_songs.jsonl")

python

!cat mock_finetune_songs.jsonl

Fine-tune on the Dataset

We now define a fine-tuning engine and fine-tune on the mock dataset.

python

from llama_index.finetuning import OpenAIFinetuneEngine

finetune_engine = OpenAIFinetuneEngine(
    "gpt-3.5-turbo",
    "mock_finetune_songs.jsonl",
    # start_job_id="<start-job-id>"  # if you have an existing job, can specify id here
    validate_json=False,  # openai validate json code doesn't support function calling yet
)

python

finetune_engine.finetune()

python

finetune_engine.get_current_job()

Try it Out!

We obtain the fine-tuned LLM and use it with the Pydantic program.

python

ft_llm = finetune_engine.get_finetuned_model(temperature=0.3)

python

ft_program = OpenAIPydanticProgram.from_defaults(
    output_cls=Album,
    prompt_template_str=prompt_template_str,
    llm=ft_llm,
    verbose=False,
)

python

ft_program(movie_name="Goodfellas")

Fine-tuning Structured Outputs through a RAG System

A use case of function calling is to get structured outputs through a RAG system.

Here we show how to create a training dataset of context-augmented inputs + structured outputs over an unstructured document. We can then fine-tune the LLM and plug it into a RAG system to perform retrieval + output extraction.

python

!mkdir data && wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

python

from pydantic import Field
from typing import List


class Citation(BaseModel):
    """Citation class."""

    author: str = Field(
        ..., description="Inferred first author (usually last name"
    )
    year: int = Field(..., description="Inferred year")
    desc: str = Field(
        ...,
        description=(
            "Inferred description from the text of the work that the author is"
            " cited for"
        ),
    )


class Response(BaseModel):
    """List of author citations.

    Extracted over unstructured text.

    """

    citations: List[Citation] = Field(
        ...,
        description=(
            "List of author citations (organized by author, year, and"
            " description)."
        ),
    )

Load Data + Setup

python

from llama_index.readers.file import PyMuPDFReader
from llama_index.core import Document
from llama_index.core.node_parser import SentenceSplitter
from pathlib import Path

python

loader = PyMuPDFReader()
docs0 = loader.load(file_path=Path("./data/llama2.pdf"))

python

doc_text = "\n\n".join([d.get_content() for d in docs0])
metadata = {
    "paper_title": "Llama 2: Open Foundation and Fine-Tuned Chat Models"
}
docs = [Document(text=doc_text, metadata=metadata)]

python

chunk_size = 1024
node_parser = SentenceSplitter(chunk_size=chunk_size)
nodes = node_parser.get_nodes_from_documents(docs)

python

len(nodes)

python

from llama_index.core import Settings

finetuning_handler = OpenAIFineTuningHandler()
callback_manager = CallbackManager([finetuning_handler])

Settings.chunk_size = chunk_size

gpt_4_llm = OpenAI(
    model="gpt-4-0613", temperature=0.3, callback_manager=callback_manager
)

gpt_35_llm = OpenAI(
    model="gpt-3.5-turbo-0613",
    temperature=0.3,
    callback_manager=callback_manager,
)

eval_llm = OpenAI(model="gpt-4-0613", temperature=0)

Generate Dataset

Here we show how to generate a training dataset over these unstructured chunks/nodes.

We generate questions to extract citations over different context. We run these questions through a GPT-4 RAG pipeline, extract structured outputs, and log inputs/outputs.

python

# setup dataset generator
from llama_index.core.evaluation import DatasetGenerator
from llama_index.core import SummaryIndex
from llama_index.core import PromptTemplate
from tqdm.notebook import tqdm
from tqdm.asyncio import tqdm_asyncio


fp = open("data/qa_pairs.jsonl", "w")

question_gen_prompt = PromptTemplate(
    """
{query_str}

Context:
{context_str}

Questions:
"""
)

question_gen_query = """\
Snippets from a research paper is given below. It contains citations.
Please generate questions from the text asking about these citations.

For instance, here are some sample questions:
Which citations correspond to related works on transformer models? 
Tell me about authors that worked on advancing RLHF.
Can you tell me citations corresponding to all computer vision works? \
"""

qr_pairs = []
node_questions_tasks = []
for idx, node in enumerate(nodes[:39]):
    num_questions = 1  # change this number to increase number of nodes
    dataset_generator = DatasetGenerator(
        [node],
        question_gen_query=question_gen_query,
        text_question_template=question_gen_prompt,
        llm=eval_llm,
        metadata_mode="all",
        num_questions_per_chunk=num_questions,
    )

    task = dataset_generator.agenerate_questions_from_nodes(num=num_questions)
    node_questions_tasks.append(task)
node_questions_lists = await tqdm_asyncio.gather(*node_questions_tasks)

python

node_questions_lists

python

from llama_index.core import VectorStoreIndex

gpt4_index = VectorStoreIndex(nodes=nodes)
gpt4_query_engine = gpt4_index.as_query_engine(
    output_cls=Response, similarity_top_k=1, llm=gpt_4_llm
)

python

from json import JSONDecodeError

for idx, node in enumerate(tqdm(nodes[:39])):
    node_questions_0 = node_questions_lists[idx]
    for question in node_questions_0:
        try:
            # note: we don't need to use response, events are logged through fine-tuning handler
            gpt4_query_engine.query(question)
        except Exception as e:
            print(f"Error for question {question}, {repr(e)}")
            pass

python

finetuning_handler.save_finetuning_events("llama2_citation_events.jsonl")

Setup Fine-tuning

We kick off fine-tuning over the generated dataset.

python

from llama_index.finetuning import OpenAIFinetuneEngine

finetune_engine = OpenAIFinetuneEngine(
    "gpt-3.5-turbo",
    "llama2_citation_events.jsonl",
    # start_job_id="<start-job-id>"  # if you have an existing job, can specify id here
    validate_json=False,  # openai validate json code doesn't support function calling yet
)

python

finetune_engine.finetune()

python

finetune_engine.get_current_job()

Use within RAG Pipeline

Let's plug the fine-tuned LLM into a full RAG pipeline that outputs structured outputs.

python

ft_llm = finetune_engine.get_finetuned_model(temperature=0.3)

python

from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex(nodes=nodes)
query_engine = vector_index.as_query_engine(
    output_cls=Response, similarity_top_k=1, llm=ft_llm
)

python

# setup baseline as well
base_index = VectorStoreIndex(nodes=nodes)
base_query_engine = base_index.as_query_engine(
    output_cls=Response, similarity_top_k=1, llm=gpt_35_llm
)

python

query_str = """\
Which citation is used to measure the truthfulness of Llama 2? \
"""
# query_str = """\
# Which citation corresponds to the concept of collecting data that represents \
# empirically sampled human preferences in RLHF?\
# """
# query_str = "Which citations in the paper discuss the development and release of Llama 2?"
# query_str = "Which citations are mentioned in the section on RLHF Results?"
# query_str = "Which citation discusses the carbon output related to the production of AI hardware?"


response = query_engine.query(query_str)
print(str(response))

python

base_response = base_query_engine.query(query_str)
print(str(base_response))

python

# view sources
print(response.source_nodes[0].get_content())

python

# as a reference, take a look at GPT-4 response
gpt4_response = gpt4_query_engine.query(query_str)
print(str(gpt4_response))