llama-index-integrations/tools/llama-index-tools-parallel-web-systems/README.md
This tool provides integration between LlamaIndex and Parallel AI's Search and Extract APIs, enabling LLM agents to perform web research and content extraction.
pip install llama-index-tools-parallel-web-systems
from llama_index.tools.parallel_web_systems import ParallelWebSystemsToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
# Initialize the tool with your API key
parallel_tool = ParallelWebSystemsToolSpec(
api_key="your-api-key-here",
)
# Create an agent with the tool
agent = FunctionAgent(
tools=parallel_tool.to_tool_list(),
llm=OpenAI(model="gpt-4o"),
)
# Use the agent to perform web research
response = await agent.run("What was the GDP of France in 2023?")
print(response)
searchSearch the web using Parallel AI's Search API. Returns structured excerpts optimized for LLM consumption.
Parameters:
objective (str, optional): Natural-language description of what to search forsearch_queries (list[str], optional): Traditional keyword search queries (max 5)max_results (int): Maximum results to return, 1-40 (default: 10)mode (str, optional): 'one-shot' for comprehensive results, 'agentic' for token-efficient resultsexcerpts (dict, optional): Excerpt settings, e.g., {'max_chars_per_result': 1500}source_policy (dict, optional): Domain and date preferencesfetch_policy (dict, optional): Cache vs live content policyAt least one of objective or search_queries must be provided.
Example:
from llama_index.tools.parallel_web_systems import ParallelWebSystemsToolSpec
parallel_tool = ParallelWebSystemsToolSpec(api_key="your-api-key")
# Search with an objective
results = parallel_tool.search(
objective="What are the latest developments in renewable energy?",
max_results=5,
mode="one-shot",
)
for doc in results:
print(f"Title: {doc.metadata.get('title')}")
print(f"URL: {doc.metadata.get('url')}")
print(f"Excerpts: {doc.text[:300]}...")
print("---")
# Search with specific queries
results = parallel_tool.search(
search_queries=["solar power 2024", "wind energy statistics"],
max_results=8,
mode="agentic",
)
extractExtract clean, structured content from web pages using Parallel AI's Extract API.
Parameters:
urls (list[str]): List of URLs to extract content fromobjective (str, optional): Natural language objective to focus extractionsearch_queries (list[str], optional): Specific keyword queries to focus extractionexcerpts (bool | dict): Include excerpts (default: True). Can be dict like {'max_chars_per_result': 2000}full_content (bool | dict): Include full page content (default: False)fetch_policy (dict, optional): Cache vs live content policyExample:
from llama_index.tools.parallel_web_systems import ParallelWebSystemsToolSpec
parallel_tool = ParallelWebSystemsToolSpec(api_key="your-api-key")
# Extract content focused on a specific objective
results = parallel_tool.extract(
urls=["https://en.wikipedia.org/wiki/Artificial_intelligence"],
objective="What are the main applications and ethical concerns of AI?",
excerpts={"max_chars_per_result": 2000},
)
for doc in results:
print(f"Title: {doc.metadata.get('title')}")
print(f"Content: {doc.text[:500]}...")
# Extract full content from multiple URLs
results = parallel_tool.extract(
urls=[
"https://example.com/article1",
"https://example.com/article2",
],
full_content=True,
excerpts=False,
)
The tool includes built-in error handling. If an API call fails, it returns an empty list, allowing your agent to continue:
results = parallel_tool.search(objective="test query")
if not results:
print("No results found or API error occurred")
For extract operations, failed URLs are included in results with error information:
results = parallel_tool.extract(urls=["https://invalid-url.com/"])
for doc in results:
if doc.metadata.get("error_type"):
print(f"Failed: {doc.metadata['url']} - {doc.text}")
MIT