Perplexity

Perplexity's Sonar API offers a solution that combines real-time, grounded web search with advanced reasoning and deep research capabilities.

When to use:

When your application requires timely, relevant data directly from the web, such as dynamic content updates or current event tracking.
For products that need to support complex user queries with integrated reasoning and deep research, like digital assistants or advanced search engines.

Before we get started, make sure you install llama_index

python

%pip install llama-index-llms-perplexity

python

!pip install llama-index

Initial Setup

As of April 12th, 2025 - the following models are supported with the Perplexity LLM class in LLaMa Index:

Model	Context Length	Model Type
`sonar-deep-research`	128k	Chat Completion
`sonar-reasoning-pro`	128k	Chat Completion
`sonar-reasoning`	128k	Chat Completion
`sonar-pro`	200k	Chat Completion
`sonar`	128k	Chat Completion
`r1-1776`	128k	Chat Completion

sonar-pro has a max output token limit of 8k.
The reasoning models output Chain of Thought responses.
r1-1776 is an offline chat model that does not use the Perplexity search subsystem.

You can find the latest supported models here
Rate limits are found here
Pricing can be found here.

python

import getpass
import os

if "PPLX_API_KEY" not in os.environ:
    os.environ["PPLX_API_KEY"] = getpass.getpass(
        "Enter your Perplexity API key: "
    )

python

from llama_index.llms.perplexity import Perplexity

PPLX_API_KEY = __import__("os").environ.get("PPLX_API_KEY")

llm = Perplexity(api_key=PPLX_API_KEY, model="sonar-pro", temperature=0.2)

python

# Import the ChatMessage class from the llama_index library.
from llama_index.core.llms import ChatMessage

# Create a list of dictionaries where each dictionary represents a chat message.
# Each dictionary contains a 'role' key (e.g., system or user) and a 'content' key with the corresponding message.
messages_dict = [
    {"role": "system", "content": "Be precise and concise."},
    {
        "role": "user",
        "content": "Tell me the latest news about the US Stock Market.",
    },
]

# Convert each dictionary in the list to a ChatMessage object using unpacking (**msg) in a list comprehension.
messages = [ChatMessage(**msg) for msg in messages_dict]

# Print the list of ChatMessage objects to verify the conversion.
print(messages)

Chat

python

response = llm.chat(messages)
print(response)

Async Chat

For asynchronous conversation processing, use the achat method to send messages and await the response:

python

# Asynchronously send the list of chat messages to the LLM using the 'achat' method.
# This method returns a ChatResponse object containing the model's answer.
response = await llm.achat(messages)

print(response)

Stream Chat

For cases where you want to receive a response token by token in real time, use the stream_chat method:

python

# Call the stream_chat method on the LLM instance, which returns a generator or iterable
# for streaming the chat response one delta (token or chunk) at a time.
response = llm.stream_chat(messages)

# Iterate over each streaming response chunk.
for r in response:
    # Print the delta (the new chunk of generated text) without adding a newline.
    print(r.delta, end="")

Async Stream Chat

Similarly, for asynchronous streaming, the astream_chat method provides a way to process response deltas asynchronously:

python

# Asynchronously call the astream_chat method on the LLM instance,
# which returns an asynchronous generator that yields response chunks.
resp = await llm.astream_chat(messages)

# Asynchronously iterate over each response chunk from the generator.
# For each chunk (delta), print the chunk's text content.
async for delta in resp:
    print(delta.delta, end="")

Tool calling

Perplexity models can easily be wrapped into a llamaindex tool so that it can be called as part of your data processing or conversational workflows. This tool uses real-time generative search powered by Perplexity, and it’s configured with the updated default model ("sonar-pro") and the enable_search_classifier parameter enabled.

Below is an example of how to define and register the tool:

python

from llama_index.core.tools import FunctionTool
from llama_index.llms.perplexity import Perplexity
from llama_index.core.llms import ChatMessage


def query_perplexity(query: str) -> str:
    """
    Queries the Perplexity API via the LlamaIndex integration.

    This function instantiates a Perplexity LLM with updated default settings
    (using model "sonar-pro" and enabling search classifier so that the API can
    intelligently decide if a search is needed), wraps the query into a ChatMessage,
    and returns the generated response content.
    """
    pplx_api_key = (
        "your-perplexity-api-key"  # Replace with your actual API key
    )

    llm = Perplexity(
        api_key=pplx_api_key,
        model="sonar-pro",
        temperature=0.7,
        enable_search_classifier=True,  # This will determine if the search component is necessary in this particular context
    )

    messages = [ChatMessage(role="user", content=query)]
    response = llm.chat(messages)
    return response.message.content


# Create the tool from the query_perplexity function
query_perplexity_tool = FunctionTool.from_defaults(fn=query_perplexity)