Back to Llama Index

Chat Engine - Condense Plus Context Mode

docs/examples/chat_engine/chat_engine_condense_plus_context.ipynb

0.14.213.7 KB
Original Source

<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/chat_engine/chat_engine_context.ipynb" target="_parent"></a>

Chat Engine - Condense Plus Context Mode

This is a multi-step chat mode built on top of a retriever over your data.

For each chat interaction:

  • First condense a conversation and latest user message to a standalone question
  • Then build a context for the standalone question from a retriever,
  • Then pass the context along with prompt and user message to LLM to generate a response.

This approach is simple, and works for questions directly related to the knowledge base and general interactions.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python
%pip install llama-index-llms-openai
python
!pip install llama-index

Download Data

python
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Get started in 5 lines of code

Load data and build index

python
import openai
import os

os.environ["OPENAI_API_KEY"] = "sk-..."
openai.api_key = os.environ["OPENAI_API_KEY"]
python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI


llm = OpenAI(model="gpt-3.5-turbo")
data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(data)

Configure chat engine

Since the context retrieved can take up a large amount of the available LLM context, let's ensure we configure a smaller limit to the chat history!

python
from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",
    memory=memory,
    llm=llm,
    context_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about an essay discussing Paul Grahams life."
        "Here are the relevant documents for the context:\n"
        "{context_str}"
        "\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
    ),
    verbose=False,
)

Chat with your data

python
response = chat_engine.chat("What did Paul Graham do growing up")
python
print(response)

Ask a follow up question

python
response_2 = chat_engine.chat("Can you tell me more?")
python
print(response_2)

Reset conversation state

python
chat_engine.reset()
python
response = chat_engine.chat("Hello! What do you know?")
python
print(response)

Streaming Support

python
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
Settings.llm = llm
data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(data)
python
chat_engine = index.as_chat_engine(
    chat_mode="condense_plus_context",
    context_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about an essay discussing Paul Grahams life."
        "Here are the relevant documents for the context:\n"
        "{context_str}"
        "\nInstruction: Based on the above documents, provide a detailed answer for the user question below."
    ),
)
python
response = chat_engine.stream_chat("What did Paul Graham do after YC?")
for token in response.response_gen:
    print(token, end="")