Vectara Managed Index

In this notebook we are going to show how to use Vectara with LlamaIndex. Please note that this notebook is for Vectara ManagedIndex versions >=0.4.0.

Vectara is the trusted AI Assistant and Agent platform which focuses on enterprise readiness for mission-critical applications.

Vectara provides an end-to-end managed service for Retrieval Augmented Generation or RAG, which includes:

An integrated API for processing input data, including text extraction from documents and ML-based chunking.
The state-of-the-art Boomerang embeddings model. Each text chunk is encoded into a vector embedding using Boomerang, and stored in the Vectara internal knowledge (vector+text) store. Thus, when using Vectara with LlamaIndex you do not need to call a separate embedding model - this happens automatically within the Vectara backend.
A query service that automatically encodes the query into embeddings and retrieves the most relevant text segmentsthrough hybrid search and a variety of reranking strategies, including a multilingual reranker, maximal marginal relevance (MMR) reranker, user-defined function reranker, and a chain reranker that provides a way to chain together multiple reranking methods to achieve better control over the reranking, combining the strengths of various reranking methods.
An option to create a generative summary with a wide selection of LLM summarizers (including Vectara's Mockingbird, trained specifically for RAG-based tasks), based on the retrieved documents, including citations.

See the Vectara API documentation for more information on how to use the API.

The main benefits of using Vectara RAG-as-a-service to build your application are:

Accuracy and Quality: Vectara provides an end-to-end platform that focuses on eliminating hallucinations, reducing bias, and safeguarding copyright integrity.
Security: Vectara's platform provides acess control--protecting against prompt injection attacks--and meets SOC2 and HIPAA compliance.
Explainability: Vectara makes it easy to troubleshoot bad results by clearly explaining rephrased queries, LLM prompts, retrieved results, and agent actions.

Getting Started

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

python

!pip install llama-index llama-index-indices-managed-vectara

To get started with Vectara, sign up (if you haven't already) and follow our quickstart guide to create a corpus and an API key.

Once you have these, you can provide them as environment variables VECTARA_CORPUS_KEY, and VECTARA_API_KEY. Make sure your API key has both query and index permissions.

RAG with LlamaIndex and Vectara

There are a few ways you can index your data into Vectara, including:

With the from_documents() or insert_file() methods of VectaraIndex
Uploading files directly in the Vectara console
Using Vectara's file upload or document index APIs
Using vectara-ingest, an open source crawler/indexer project
Using one of our ingest integration partners like Airbyte, Unstructured or DataVolo.

For this purpose, we will use a simple set of small documents, so using VectaraIndex directly for the ingest is good enough.

Let's ingest the "AI bill of rights" document into our new corpus.

python

from llama_index.indices.managed.vectara import VectaraIndex
import requests

url = "https://www.whitehouse.gov/wp-content/uploads/2022/10/Blueprint-for-an-AI-Bill-of-Rights.pdf"
response = requests.get(url)
local_path = "ai-bill-of-rights.pdf"
with open(local_path, "wb") as file:
    file.write(response.content)

index = VectaraIndex()
index.insert_file(
    local_path, metadata={"name": "AI bill of rights", "year": 2022}
)

Running single queries with Vectara Query Engine

Now that we've uploaded the document (or if documents have been uploaded previously) we can go and ask questions directly in LlamaIndex. This activates Vectara's RAG pipeline.

To use Vectara's internal LLM for summarization, make sure you specify summary_enabled=True when you generate the Query engine. Here's an example:

python

questions = [
    "What are the risks of AI?",
    "What should we do to prevent bad actors from using AI?",
    "What are the benefits?",
]

python

qe = index.as_query_engine(
    n_sentences_before=1,
    n_sentences_after=1,
    summary_enabled=True,
    summary_prompt_name="mockingbird-1.0-2024-07-16",
)
qe.query(questions[0]).response

If you want the response to be returned in streaming mode, simply set streaming=True

python

qe = index.as_query_engine(
    n_sentences_before=1,
    n_sentences_after=1,
    summary_enabled=True,
    summary_prompt_name="mockingbird-1.0-2024-07-16",
    streaming=True,
)
response = qe.query(questions[0])

response.print_response_stream()

Using Vectara Chat

Vectara also supports a simple chat mode. In this mode the chat history is maintained by Vectara and so you don't have to worry about it. To use it simple call as_chat_engine.

(Chat mode always uses Vectara's summarization so you don't have to explicitly specify summary_enabled=True like before)

python

ce = index.as_chat_engine(n_sentences_before=1, n_sentences_after=1)

python

for q in questions:
    print(f"Question: {q}\n")
    response = ce.chat(q).response
    print(f"Response: {response}\n")

Of course streaming works as well with Chat:

python

ce = index.as_chat_engine(
    n_sentences_before=1, n_sentences_after=1, streaming=True
)

python

response = ce.stream_chat("Will artificial intelligence rule the government?")

response.print_response_stream()

Agentic RAG

Vectara also has its own package, vectara-agentic, built on top of many features from LlamaIndex to easily implement agentic RAG applications. It allows you to create your own AI assistant with RAG query tools and other custom tools, such as making API calls to retrieve information from financial websites. You can find the full documentation for vectara-agentic here.

Let's create a ReAct Agent with a single RAG tool using vectara-agentic (to create a ReAct agent, specify VECTARA_AGENTIC_AGENT_TYPE as "REACT" in your environment).

Vectara does not yet have an LLM capable of acting as an agent for planning and tool use, so we will need to use another LLM as the driver of the agent resoning.

In this demo, we are using OpenAI's GPT4o. Please make sure you have OPENAI_API_KEY defined in your environment or specify another LLM with the corresponding key (for the full list of supported LLMs, check out our documentation for setting up your environment).

python

!pip install -U vectara-agentic

python

from vectara_agentic.agent import Agent
from IPython.display import display, Markdown

agent = Agent.from_corpus(
    tool_name="query_ai",
    data_description="AI regulations",
    assistant_specialty="artificial intelligence",
    vectara_reranker="mmr",
    vectara_rerank_k=50,
    vectara_summary_num_results=5,
    vectara_summarizer="mockingbird-1.0-2024-07-16",
    verbose=True,
)

response = agent.chat(
    "What are the risks of AI? What are the benefits? Compare and contrast and provide a summary with arguments for and against from experts."
)

display(Markdown(response))