Moss + LlamaIndex

Moss is a search runtime for voice agents, copilots, and multimodal apps. Sub-10ms lookups, zero infrastructure. Built in Rust and WebAssembly. Moss runs search inside your agent runtime. Sub-10ms lookups, always-current data, without rebuilding your infra.

python

%pip install llama-index-tools-moss llama-index-core llama-index-llms-openai

Setup

Import the necessary classes and initialize the Moss client with your project credentials. We'll configure query options to tune the search behavior (top_k for result count, alpha for the semantic/keyword blend).

python

import os
from llama_index.tools.moss import MossToolSpec, QueryOptions
from inferedge_moss import MossClient, DocumentInfo

# Initialize Moss Client
MOSS_PROJECT_KEY = os.getenv("MOSS_PROJECT_KEY")
MOSS_PROJECT_ID = os.getenv("MOSS_PROJECT_ID")
client = MossClient(project_id=MOSS_PROJECT_ID, project_key=MOSS_PROJECT_KEY)

# 2. Configure query settings - Instantiate QueryOptions (Optional)
# If skipped, the tool will use its own defaults.
query_options = QueryOptions(top_k=3, alpha=0.5)
# 3. Initialize Tool
print("Initializing MossToolSpec...")
moss_tool = MossToolSpec(
    client=client, index_name="moss-cookbook", query_options=query_options
)

Indexing Data

Create sample documents with DocumentInfo (each with id, text, and metadata). Then call index_docs() to build the Moss index — this creates or replaces the index with your documents and makes them searchable.

python

docs = [
    DocumentInfo(
        id="123",
        text="LlamaIndex connects your data to LLMs.",
        metadata={"topic": "AI"},
    ),
    DocumentInfo(
        id="456",
        text="Moss provides fast semantic search integration.",
        metadata={"topic": "Search"},
    ),
]

# Index the documents
await moss_tool.index_docs(docs)
await moss_tool._load_index()

Using with an Agent

Now we'll expose the Moss tool's methods (query, list_indexes, delete_index) to a ReActAgent. The agent will autonomously decide which tool to call based on the user's question and the search results it retrieves.

python

from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Convert to tool list
tools = moss_tool.to_tool_list()

# Create an agent (Using OpenAI llm for demonstration)
# os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')
llm = OpenAI(model="gpt-4.1-mini", api_key=api_key)
agent = ReActAgent(tools=tools, llm=llm, verbose=True)

# Chat with the agent
response = await agent.run(user_msg="What is your return policy?")
print(response)