You.com Retriever

This notebook demonstrates how to use You.com's Search API as a retriever in LlamaIndex. The API automatically returns relevant web and/or news results based on your query. Visit our docs to learn more about our Search and other APIs: https://docs.you.com/

The retriever converts You.com's search results into LlamaIndex's standard format (NodeWithScore), allowing you to:

Use search results as context for LLM queries
Combine with other retrievers (vector stores, databases)
Integrate seamlessly with query engines and agents

Running cells with '.venv (Python 3.13.9)' requires the ipykernel package. You may need to install it into your Python environment.

To get started, install the llama-index-retrievers-you package.

python

%pip install llama-index-retrievers-you

Setup

Get your API key from the You.com platform

python

import os
from getpass import getpass

# Set your API key
you_api_key = os.environ.get("YDC_API_KEY") or getpass(
    "Enter your You.com API key: "
)

Basic usage

First, let's set up the retriever and see what data it returns:

python

from llama_index.retrievers.you import YouRetriever

retriever = YouRetriever(api_key=you_api_key)
retrieved_results = retriever.retrieve("national parks in the US")

print(f"Retrieved {len(retrieved_results)} results")

for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Async usage

The retriever also supports async operations.

python

from llama_index.retrievers.you import YouRetriever

retriever = YouRetriever(api_key=you_api_key)

# Use aretrieve for async operations
retrieved_results = await retriever.aretrieve("national parks in the US")

print(f"Retrieved {len(retrieved_results)} results asynchronously")

for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Getting the latest news

The You.com API can also news results automatically, based on your query.

python

# News-related queries will include news results in the response
from typing import Any

# You should see at most 5 results per type - news and web
# Notice the source_type: "news" or "web"
retriever = YouRetriever(api_key=you_api_key, count=5, country="IN")

retrieved_results = retriever.retrieve(
    "What are the latest geopolitical updates in India"
)

print(f"Retrieved {len(retrieved_results)} results")
for i, result in enumerate[Any](retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Customizing Search Parameters

You can customize the search with optional parameters:

python

retriever = YouRetriever(
    api_key=you_api_key,
    count=20,  # Return up to 20 results per section (web/news)
    country="US",  # Focus on US results
    language="en",  # English results
    freshness="week",  # Results from the past week
    safesearch="moderate",  # Moderate safe search filtering
)

retrieved_results = retriever.retrieve("renewable energy breakthroughs")

print(f"Retrieved {len(retrieved_results)} recent results from the US")
for i, result in enumerate(retrieved_results):
    print(f"\nResult {i+1}:")
    print(f"  Text: {result.node.text}...")
    print("Metadata:")
    for key, value in result.node.metadata.items():
        print(f"  {key}: {value}")

Using with Query Engine

Now that we've seen how to customize the web data we want to retrieve, let's use an LLM to synthesize natural language answers from the search results. In this example, we'll use a model from Anthropic.

python

%pip install llama-index-llms-anthropic

python

import os
from getpass import getpass

# Set your Anthropic API key
anthropic_api_key = os.environ.get("ANTHROPIC_API_KEY") or getpass(
    "Enter your Anthropic API key: "
)

python

from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
from llama_index.retrievers.you import YouRetriever

# Configure Anthropic as your LLM
llm = Anthropic(model="claude-haiku-4-5-20251001", api_key=anthropic_api_key)

# Create a query engine that uses You.com search results as context
retriever = YouRetriever(api_key=you_api_key)
query_engine = RetrieverQueryEngine.from_args(retriever, llm)

python

# The query engine:
# 1. Uses the retriever to fetch relevant search results from You.com
# 2. Passes those results as context to the LLM
# 3. Returns a synthesized answer

response = query_engine.query(
    "What are the most visited national parks in the US and why? keep it brief."
)

# Try a different query
# response = query_engine.query("What are the latest geopolitical updates from India")

print(str(response))

Why this format?

The retriever converts You.com's JSON response into LlamaIndex's standard NodeWithScore format. This provides:

Benefits:

Source-agnostic: Same interface whether retrieving from You.com, vector DBs, or other sources
Composability: Easily combine multiple retrievers or swap them out
Integration: Works seamlessly with LlamaIndex query engines, agents, and other components

What's preserved:

Text content: Snippets from web results or descriptions from news articles
Metadata: URL, title, page_age stored in the metadata dict
Score: Relevance score (1.0 by default since You.com doesn't provide scores)

This abstraction lets you focus on building applications rather than handling API-specific response formats.