docs/examples/agent/nvidia_agent.ipynb
This notebook shows you how to use our NVIDIA agent, powered by function calling capabilities.
Let's start by importing some simple building blocks.
The main thing we need is:
llama_index LLM class)%pip install --upgrade --quiet llama-index-llms-nvidia
import getpass
import os
# del os.environ['NVIDIA_API_KEY'] ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
assert nvapi_key.startswith(
"nvapi-"
), f"{nvapi_key[:5]}... is not a valid key"
os.environ["NVIDIA_API_KEY"] = nvapi_key
from llama_index.llms.nvidia import NVIDIA
from llama_index.core.tools import FunctionTool
from llama_index.embeddings.nvidia import NVIDIAEmbedding
Let's define some very simple calculator tools for our agent.
def multiply(a: int, b: int) -> int:
"""Multiple two integers and returns the result integer"""
return a * b
def add(a: int, b: int) -> int:
"""Add two integers and returns the result integer"""
return a + b
Here we initialize a simple NVIDIA agent with calculator functions.
llm = NVIDIA("meta/llama-3.1-70b-instruct")
from llama_index.core.agent.workflow import FunctionAgent
agent = FunctionAgent(
tools=[multiply, add],
llm=llm,
)
response = await agent.run("What is (121 * 3) + 42?")
print(str(response))
# inspect sources
print(response.tool_calls)
By default, .run() is stateless. If you want to maintain state, you can pass in a context object.
from llama_index.core.agent.workflow import Context
ctx = Context(agent)
response = await agent.run("Hello, my name is John Doe.", ctx=ctx)
print(str(response))
response = await agent.run("What is my name?", ctx=ctx)
print(str(response))
You can specify a system prompt to give the agent additional instruction or personality.
agent = FunctionAgent(
tools=[multiply, add],
llm=llm,
system_prompt="Talk like a pirate in every response.",
)
response = await agent.run("Hi")
print(response)
response = await agent.run("Tell me a story")
print(response)
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
from llama_index.core.tools import QueryEngineTool
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
embed_model = NVIDIAEmbedding(model="NV-Embed-QA", truncate="END")
# load data
uber_docs = SimpleDirectoryReader(
input_files=["./data/10k/uber_2021.pdf"]
).load_data()
# build index
uber_index = VectorStoreIndex.from_documents(
uber_docs, embed_model=embed_model
)
uber_engine = uber_index.as_query_engine(similarity_top_k=3, llm=llm)
query_engine_tool = QueryEngineTool.from_defaults(
query_engine=uber_engine,
name="uber_10k",
description=(
"Provides information about Uber financials for year 2021. "
"Use a detailed plain text question as input to the tool."
),
)
agent = FunctionAgent(tools=[query_engine_tool], llm=llm)
response = await agent.run(
"Tell me both the risk factors and tailwinds for Uber? Do two parallel tool calls."
)
print(str(response))
from llama_index.core.agent.workflow import ReActAgent
agent = ReActAgent([multiply_tool, add_tool], llm=llm, verbose=True)
Using the stream_events() method, we can stream the response as it is generated to see the agent's thought process.
The final response will have only the final answer.
from llama_index.core.agent.workflow import AgentStream
handler = agent.run("What is 20+(2*4)? Calculate step by step ")
async for ev in handler.stream_events():
if isinstance(ev, AgentStream):
print(ev.delta, end="", flush=True)
response = await handler
print(str(response))
print(response.tool_calls)