Back to Agent Lightning

Search-R1 Example

contrib/recipes/search_r1/README.md

0.3.03.9 KB
Original Source

Search-R1 Example

Overview

This example implements Search R1 within Agent Lightning. It also serves as a demonstration of a framework-free agent training pipeline, showing how to run end-to-end RL training without relying on specialized frameworks. It's tested and compatible with Agent-lightning v0.2.x.

The example is designed to run on a single node with 8 GPUs, each having at least 40 GB of memory.

Included Files

File/DirectoryDescription
data_process.shPrepares the Wikipedia corpus, datasets, and retriever conda environment
retrieval_launch.shLaunches the retrieval service backed by the processed corpus
retrieval_server.pyFastAPI server that powers document retrieval during training
search_r1_agent.pyAgent-Lightning rollout script implementing the Search-R1 workflow
train_search_r1_agent.pyRL training script that coordinates GRPO optimization
qa_em.pyExact-match evaluation utilities for validating model predictions

Prepare Data and Environment

Run the following script once to prepare data and the retriever environment:

bash
bash data_process.sh

This script performs the following steps:

  • Creates a new conda environment named retriever.
  • Downloads the Wikipedia data used to build the retrieval database.
  • Downloads the training and testing datasets.
  • Stores all data under the newly created data/ directory.

The environment setup and data-processing logic are adapted from PeterGriffinJin/Search-R1.


Prepare Retrieval Server

To start the retrieval server, run:

bash
bash retrieval_launch.sh

This script activates the previously created retriever environment and starts a retrieval server at http://127.0.0.1:8000 using the downloaded Wikipedia data. The server receives user queries and returns a ranked list of retrieved text passages.

The retrieval server implementation is based on search_r1/search/retrieval_server.py](https://github.com/PeterGriffinJin/Search-R1/blob/main/search_r1/search/retrieval_server.py).

⚠️ Note: Keep the retrieval server running during training (for example, in a separate tmux session or terminal window).


Run RL Training (GRPO) with Llama-3.2-3B-Instruct

  1. Start Ray

    bash
    bash ../../scripts/restart_ray.sh
    

    If you plan to use WandB for experiment tracking, set the environment variable WANDB_API_KEY before starting Ray.

  2. Start the Training Server In another terminal, run:

    bash
    python train_search_r1_agent.py llama
    

    This script starts the RL training. Each agent follows the Search-R1 workflow, retrieving information from the database and generating answers accordingly.


Benchmark Results

We evaluated Search-R1 across seven diverse question-answering benchmarks, covering both General QA (NQ, TriviaQA, PopQA) and complex multi-hop reasoning tasks (HotpotQA, 2WikiMultiHopQA, Musique, and Bamboogle).

The following tables compare the performance of the original Search-R1 implementation and the Agent-Lightning version across various base models.

ModelSourceNQTriviaQAPopQAHotpotQA2WikiMusiqueBamboogle
Qwen2.5-3B-InstructSearch-R1 (Original)34.154.537.832.431.910.326.4
Agent-Lightning45.361.743.842.636.417.137.6
Qwen2.5-7B-InstructSearch-R1 (Original)39.361.039.737.041.414.636.8
Agent-Lightning46.565.946.843.746.220.347.2
Llama-3.2-3BSearch-R1 (Reproduced)26.349.023.021.627.34.59.7
Agent-Lightning29.651.925.723.228.35.89.6