Chat Engine - Context Mode

ContextChatEngine is a simple chat mode built on top of a retriever over your data.

For each chat interaction:

first retrieve text from the index using the user message
set the retrieved text as context in the system prompt
return an answer to the user message

This approach is simple, and works for questions directly related to the knowledge base and general interactions.

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

%pip install llama-index-llms-openai

python

!pip install llama-index

Download Data

python

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Get started in 5 lines of code

Load data and build index

python

import openai
import os

os.environ["OPENAI_API_KEY"] = "API_KEY_HERE"
openai.api_key = os.environ["OPENAI_API_KEY"]

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(data)

Configure chat engine

Since the context retrieved can take up a large amount of the available LLM context, let's ensure we configure a smaller limit to the chat history!

python

from llama_index.core.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about an essay discussing Paul Grahams life."
    ),
)

Chat with your data

python

response = chat_engine.chat("Hello!")

python

print(response)

Ask a follow up question

python

response = chat_engine.chat("What did Paul Graham do growing up?")

python

print(response)

python

response = chat_engine.chat("Can you tell me more?")

python

print(response)

Reset conversation state

python

chat_engine.reset()

python

response = chat_engine.chat("Hello! What do you know?")

python

print(response)

Streaming Support

python

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
data = SimpleDirectoryReader(input_dir="./data/paul_graham/").load_data()

index = VectorStoreIndex.from_documents(data)

python

chat_engine = index.as_chat_engine(chat_mode="context", llm=llm)

python

response = chat_engine.stream_chat("What did Paul Graham do after YC?")
for token in response.response_gen:
    print(token, end="")