docs/examples/managed/vectaraDemo.ipynb
<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb" target="_parent"></a>
In this notebook we are going to show how to use Vectara with LlamaIndex. Please note that this notebook is for Vectara ManagedIndex versions >=0.4.0.
Vectara is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.
Vectara provides an end-to-end managed service for Retrieval Augmented Generation or RAG, which includes:
An integrated API for processing input data, including text extraction from documents and ML-based chunking.
The state-of-the-art Boomerang embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store. Thus, when using Vectara with LlamaIndex you do not need to call a separate embedding model - this happens automatically within the Vectara backend.
A query service that automatically encodes the query into embeddings and retrieves the most relevant text segmentsthrough hybrid search and a variety of reranking strategies, including a multilingual reranker, maximal marginal relevance (MMR) reranker, user-defined function reranker, and a chain reranker that provides a way to chain together multiple reranking methods to achieve better control over the reranking, combining the strengths of various reranking methods.
An option to create a generative summary with a wide selection of LLM summarizers (including Vectara's Mockingbird, trained specifically for RAG-based tasks), based on the retrieved documents, including citations.
See the Vectara API documentation for more information on how to use the API.
The main benefits of using Vectara RAG-as-a-service to build your application are:
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
!pip install llama-index llama-index-indices-managed-vectara
To get started with Vectara, sign up (if you haven't already) and follow our quickstart guide to create a corpus and an API key.
Once you have these, you can provide them as environment variables VECTARA_CORPUS_KEY, and VECTARA_API_KEY. Make sure your API key has both query and index permissions.
There are a few ways you can index your data into Vectara, including:
from_documents() or insert_file() methods of VectaraIndexFor this purpose, we will use a simple set of small documents, so using VectaraIndex directly for the ingest is good enough.
Let's ingest the "AI bill of rights" document into our new corpus.
from llama_index.indices.managed.vectara import VectaraIndex
import requests
url = "https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf"
response = requests.get(url)
local_path = "ai-bill-of-rights.pdf"
with open(local_path, "wb") as file:
file.write(response.content)
index = VectaraIndex()
index.insert_file(
local_path, metadata={"name": "AI bill of rights", "year": 2022}
)
Now that we've uploaded the document (or if documents have been uploaded previously) we can go and ask questions directly in LlamaIndex. This activates Vectara's RAG pipeline.
To use Vectara's internal LLM for summarization, make sure you specify summary_enabled=True when you generate the Query engine. Here's an example:
questions = [
"What are the risks of AI?",
"What should we do to prevent bad actors from using AI?",
"What are the benefits?",
]
qe = index.as_query_engine(
n_sentences_before=1,
n_sentences_after=1,
summary_enabled=True,
summary_prompt_name="mockingbird-1.0-2024-07-16",
)
qe.query(questions[0]).response
If you want the response to be returned in streaming mode, simply set streaming=True
qe = index.as_query_engine(
n_sentences_before=1,
n_sentences_after=1,
summary_enabled=True,
summary_prompt_name="mockingbird-1.0-2024-07-16",
streaming=True,
)
response = qe.query(questions[0])
response.print_response_stream()
Vectara also supports a simple chat mode. In this mode the chat history is maintained by Vectara and so you don't have to worry about it. To use it simple call as_chat_engine.
(Chat mode always uses Vectara's summarization so you don't have to explicitly specify summary_enabled=True like before)
ce = index.as_chat_engine(n_sentences_before=1, n_sentences_after=1)
for q in questions:
print(f"Question: {q}\n")
response = ce.chat(q).response
print(f"Response: {response}\n")
Of course streaming works as well with Chat:
ce = index.as_chat_engine(
n_sentences_before=1, n_sentences_after=1, streaming=True
)
response = ce.stream_chat("Will artificial intelligence rule the government?")
response.print_response_stream()
Vectara also has its own package, vectara-agentic, built on top of many features from LlamaIndex to easily implement agentic RAG applications. It allows you to create your own AI assistant with RAG query tools and other custom tools, such as making API calls to retrieve information from financial websites. You can find the full documentation for vectara-agentic here.
Let's create a ReAct Agent with a single RAG tool using vectara-agentic (to create a ReAct agent, specify VECTARA_AGENTIC_AGENT_TYPE as "REACT" in your environment).
Vectara does not yet have an LLM capable of acting as an agent for planning and tool use, so we will need to use another LLM as the driver of the agent resoning.
In this demo, we are using OpenAI's GPT4o. Please make sure you have OPENAI_API_KEY defined in your environment or specify another LLM with the corresponding key (for the full list of supported LLMs, check out our documentation for setting up your environment).
!pip install -U vectara-agentic
from vectara_agentic.agent import Agent
from IPython.display import display, Markdown
agent = Agent.from_corpus(
tool_name="query_ai",
data_description="AI regulations",
assistant_specialty="artificial intelligence",
vectara_reranker="mmr",
vectara_rerank_k=50,
vectara_summary_num_results=5,
vectara_summarizer="mockingbird-1.0-2024-07-16",
verbose=True,
)
response = agent.chat(
"What are the risks of AI? What are the benefits? Compare and contrast and provide a summary with arguments for and against from experts."
)
display(Markdown(response))