examples/nvidia_deep_agent/README.md
General-purpose deep agent showcasing multi-model architecture with GPU code execution: a frontier model orchestrates and processes data while NVIDIA Nemotron Super handles research, all backed by a GPU sandbox running NVIDIA RAPIDS.
create_deep_agent (orchestrator: frontier model)
|
|-- researcher-agent (Nemotron Super)
| Conducts web searches, gathers and synthesizes information
|
|-- data-processor-agent (frontier model)
| Writes and executes Python scripts on GPU sandbox
| GPU-accelerated data analysis, ML, visualization, document processing
|
|-- skills/
| cudf-analytics GPU data analysis (groupby, stats, anomaly detection)
| cuml-machine-learning GPU ML (classification, regression, clustering, PCA)
| data-visualization Publication-quality charts (matplotlib, seaborn)
| gpu-document-processing Large document processing via GPU sandbox
|
|-- memory/
| AGENTS.md Persistent agent instructions (self-improving)
|
|-- backend: Modal Sandbox (GPU or CPU, switchable at runtime)
Skills + memory uploaded on sandbox creation
Agent reads/writes/executes directly inside the sandbox
Why multi-model? The frontier model handles planning, synthesis, and code generation where reasoning quality matters. Nemotron Super handles the volume work (web research) where speed and cost matter.
How GPU execution works: The data-processor-agent reads skill documentation (SKILL.md), writes Python scripts using RAPIDS APIs (cuDF, cuML), and executes them on a Modal sandbox via the execute tool. Charts are displayed inline via read_file.
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh
Install dependencies:
cd nemotron-deep-agent
uv sync
Set your API keys in your .env file or export them:
export ANTHROPIC_API_KEY=your_key # For Claude frontier model
export NVIDIA_API_KEY=your_key # For Nemotron Super via NIM
export TAVILY_API_KEY=your_key # For web search
export LANGSMITH_API_KEY=your_key # For tracing (optional)
export LANGSMITH_PROJECT="nemotron-deep-agent"
export LANGSMITH_TRACING="true"
Add your Modal keys to your .env(MODAL_TOKEN_ID & MODEL_TOKEN_SECRET)
OR
use Modal's CLI to authenticate:
uv run modal setup
Run with LangGraph server:
uv run langgraph dev --allow-blocking
The agent supports runtime switching between GPU and CPU sandboxes via context_schema. Pass context={"sandbox_type": "gpu"} or context={"sandbox_type": "cpu"} when invoking. In Studio you can change this by clicking the manage assistants button on the bottom left.
GPU mode uses the NVIDIA RAPIDS Docker image with an A10G GPU. CPU mode uses a lightweight image with pandas, numpy, and scipy.
Start the server:
uv run langgraph dev --allow-blocking
Then open LangSmith Studio and try:
Generate a 1000-row random dataset about credit card transactions with columns
(id, value, category, score) use your cudf skill, then do some cool analysis
and give me some insights on that data!
The agent will delegate to the data-processor-agent, which reads the cuDF skill, writes a Python script to generate and analyze the dataset on the GPU sandbox, and returns structured insights with inline charts.
Resume from human in the loop interrupts in Studio by pasting:
{"decisions": [{"type": "approve"}]}
Data Analysis: "Generate a 1000-row random dataset about credit card transactions with columns (id, value, category, score), then analyze it for trends and anomalies"
Research + Analysis: "Research the latest trends in renewable energy adoption, then create a visualization comparing solar vs wind capacity growth"
ML: "Upload this CSV and train a classifier to predict customer churn. Show feature importances."
Configured in src/agent.py via init_chat_model (supports any provider):
frontier_model = init_chat_model("anthropic:claude-sonnet-4-6")
Configured via NVIDIA's NIM endpoint (OpenAI-compatible):
nemotron_super = ChatNVIDIA(
model="private/nvidia/nemotron-3-super-120b-a12b"
)
The agent uses a Modal sandbox with the NVIDIA RAPIDS base image (cuDF, cuML pre-installed). GPU type is A10G by default.
To use a different GPU tier, modify src/agent.py:
create_kwargs["gpu"] = "A100" # or "T4", "H100"
Skills teach the agent how to use NVIDIA libraries via the Agent Skills Specification. Each skill is a SKILL.md file the agent reads when it encounters a matching task.
GPU-accelerated data analysis using NVIDIA RAPIDS cuDF. Pandas-like API on GPU for groupby, statistics, correlation, and anomaly detection.
GPU-accelerated machine learning using NVIDIA RAPIDS cuML. Scikit-learn compatible API for classification, regression, clustering, dimensionality reduction (PCA, UMAP, t-SNE), and preprocessing — all on GPU.
Publication-quality charts using matplotlib and seaborn in headless mode. Includes templates for bar, line, scatter, heatmap, histogram, box plots, and multi-panel dashboard summaries with a colorblind-safe palette. Charts are displayed inline in the conversation via read_file.
Large document processing via the sandbox-as-tool pattern. Agent writes extraction scripts and runs them on GPU.
skills/
my-skill/
SKILL.md
The agent has persistent memory via AGENTS.md, loaded at startup through the memory parameter. When the agent discovers something reusable during execution — like a library API that doesn't exist, a better code pattern, or a non-obvious error fix — it edits its own skill files to capture that knowledge for future runs.
For example, if the data-processor-agent discovers that cudf.DataFrame.interpolate() isn't implemented, it updates skills/cudf-analytics/SKILL.md with a "Known Limitations" note so it won't repeat the mistake.
Memory and skills are uploaded into the sandbox on creation via upload_files. The agent reads and edits them directly inside the sandbox; changes persist for the sandbox's lifetime. In production, swap the local file reads in _seed_sandbox for your storage layer (S3, database, etc.). See src/backend.py for the backend configuration.
src/prompts.pysrc/agent.pyFor a full enterprise deployment with NeMo Agent Toolkit, evaluation harnesses, knowledge layer, and frontend, see NVIDIA's AIQ Blueprint: https://github.com/langchain-ai/aiq-blueprint