Back to Llama Index

Guidance for Sub-Question Query Engine

docs/examples/output_parsing/guidance_sub_question.ipynb

0.14.214.0 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/output_parsing/guidance_sub_question.ipynb" target="_parent"></a>

Guidance for Sub-Question Query Engine

In this notebook, we showcase how to use guidance to improve the robustness of our sub-question query engine.

The sub-question query engine is designed to accept swappable question generators that implement the BaseQuestionGenerator interface.
To leverage the power of guidance, we implemented a new GuidanceQuestionGenerator (powered by our GuidancePydanticProgram)

Guidance Question Generator

Unlike the default LLMQuestionGenerator, guidance guarantees that we will get the desired structured output, and eliminate output parsing errors.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-question-gen-guidance
python
!pip install llama-index
python
from llama_index.question_gen.guidance import GuidanceQuestionGenerator
from guidance.llms import OpenAI as GuidanceOpenAI
python
question_gen = GuidanceQuestionGenerator.from_defaults(
    guidance_llm=GuidanceOpenAI("text-davinci-003"), verbose=False
)

Let's test it out!

python
from llama_index.core.tools import ToolMetadata
from llama_index.core import QueryBundle
python
tools = [
    ToolMetadata(
        name="lyft_10k",
        description="Provides information about Lyft financials for year 2021",
    ),
    ToolMetadata(
        name="uber_10k",
        description="Provides information about Uber financials for year 2021",
    ),
]
python
sub_questions = question_gen.generate(
    tools=tools,
    query=QueryBundle("Compare and contrast Uber and Lyft financial in 2021"),
)
python
sub_questions

Using Guidance Question Generator with Sub-Question Query Engine

Prepare data and base query engines

python
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.response.pprint_utils import pprint_response

from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.query_engine import SubQuestionQueryEngine

Download Data

python
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'
python
lyft_docs = SimpleDirectoryReader(
    input_files=["./data/10k/lyft_2021.pdf"]
).load_data()
uber_docs = SimpleDirectoryReader(
    input_files=["./data/10k/uber_2021.pdf"]
).load_data()
python
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)
python
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)

Construct sub-question query engine and run some queries!

python
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021"
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021"
            ),
        ),
    ),
]

s_engine = SubQuestionQueryEngine.from_defaults(
    question_gen=question_gen,  # use guidance based question_gen defined above
    query_engine_tools=query_engine_tools,
)
python
response = s_engine.query(
    "Compare and contrast the customer segments and geographies that grew the"
    " fastest"
)
python
print(response)