docs/self-hosting/advanced/online-search.mdx
LobeHub supports configuring web search functionality for AI, enabling it to retrieve real-time information from the internet to provide more accurate and up-to-date responses. Web search supports multiple search engine providers, including SearXNG, Search1API, Google, and Brave, among others.
<Callout type="info"> Web search allows AI to access time-sensitive content, such as the latest news, technology trends, or product information. You can deploy the open-source SearXNG yourself, or choose to integrate mainstream search services like Search1API, Google, Brave, etc., combining them freely based on your use case. </Callout>By setting the search service environment variable SEARCH_PROVIDERS and the corresponding API Keys, LobeHub will query multiple sources and return the results. You can also configure crawler service environment variables such as CRAWLER_IMPLS (e.g., browserless, firecrawl, tavily, etc.) to extract webpage content, enhancing the capability of search + reading.
CRAWLER_IMPLSConfigure available web crawlers for structured extraction of webpage content.
CRAWLER_IMPLS="naive,search1api"
Supported crawler types are listed below:
| Value | Description | Environment Variable |
|---|---|---|
browserless | Headless browser crawler based on Browserless, suitable for rendering complex pages. | BROWSERLESS_TOKEN |
exa | Crawler capabilities provided by Exa, API required. | EXA_API_KEY |
firecrawl | Firecrawl headless browser API, ideal for modern websites. | FIRECRAWL_API_KEY |
jina | Crawler service from Jina AI, supports fast content summarization. | JINA_READER_API_KEY |
naive | Built-in general-purpose crawler for standard web structures. | |
search1api | Page crawling capabilities from Search1API, great for structured content extraction. | SEARCH1API_API_KEY SEARCH1API_CRAWL_API_KEY SEARCH1API_SEARCH_API_KEY |
tavily | Web scraping and summarization API from Tavily. | TAVILY_API_KEY |
π‘ Setting multiple crawlers increases success rate; the system will try different ones based on priority.
CRAWL_CONCURRENCYControls crawler concurrency per crawl task. The default is 3. On low-resource servers, use 1 to reduce CPU spikes.
CRAWL_CONCURRENCY=3
CRAWLER_RETRYControls retry attempts per URL on crawl failures. The default is 1 (up to 2 attempts total).
CRAWLER_RETRY=1
SEARCH_PROVIDERSConfigure which search engine providers to use for web search.
SEARCH_PROVIDERS="searxng"
Supported search engines include:
| Value | Description | Environment Variable |
|---|---|---|
anspire | Search service provided by Anspire. | ANSPIRE_API_KEY |
bocha | Search service from Bocha. | BOCHA_API_KEY |
brave | Brave, a privacy-friendly search source. | BRAVE_API_KEY |
exa | Exa, a search API designed for AI. | EXA_API_KEY |
firecrawl | Search capabilities via Firecrawl. | FIRECRAWL_API_KEY |
google | Uses Google Programmable Search Engine. | GOOGLE_PSE_API_KEY GOOGLE_PSE_ENGINE_ID |
jina | Semantic search provided by Jina AI. | JINA_READER_API_KEY |
kagi | Premium search API by Kagi, requires a subscription key. | KAGI_API_KEY |
search1api | Aggregated search capabilities from Search1API. | SEARCH1API_API_KEY SEARCH1API_CRAWL_API_KEY SEARCH1API_SEARCH_API_KEY |
searxng | Use a self-hosted or public SearXNG instance. | SEARXNG_URL |
tavily | Tavily, offers fast web summaries and answers. | TAVILY_API_KEY |
β οΈ Some search providers require you to apply for an API Key and configure it in your
.envfile.
BROWSERLESS_URLSpecifies the API endpoint for Browserless, used for web crawling tasks. Browserless is a browser automation platform based on Headless Chrome, ideal for rendering dynamic pages.
BROWSERLESS_URL=https://chrome.browserless.io
π Usually used together with
CRAWLER_IMPLS=browserless.
BROWSERLESS_BLOCK_ADSEnables ad blocking functionality. When using Browserless for web scraping, it automatically blocks common ad resources (such as scripts, images, trackers, etc.), improving scraping speed and page clarity.
BROWSERLESS_BLOCK_ADS=1
π Supported values:
1: Enable ad blocking (recommended);0: Disable ad blocking (default).
β It is recommended to use with
BROWSERLESS_STEALTH_MODE=1to enhance stealth and scraping success rate.
BROWSERLESS_STEALTH_MODEEnables stealth mode. When using Browserless for web scraping, it applies various anti-detection techniques (such as modifying the user agent, removing webdriver traits, simulating user interactions) to bypass anti-bot mechanisms.
BROWSERLESS_STEALTH_MODE=1
π Supported values:
1: Enable stealth mode (recommended);0: Disable stealth mode (default).
β οΈ Some websites use advanced anti-scraping techniques. Enabling stealth mode can significantly improve scraping success rate.
GOOGLE_PSE_ENGINE_IDConfigure the Search Engine ID for Google Programmable Search Engine (Google PSE), used to restrict the search scope. Must be used alongside GOOGLE_PSE_API_KEY.
GOOGLE_PSE_ENGINE_ID=your-google-cx-id
π How to get it: Visit programmablesearchengine.google.com, create a search engine, and obtain the
cxparameter.
FIRECRAWL_URLSets the access URL for the Firecrawl API, used for web content scraping. Default value:
FIRECRAWL_URL=https://api.firecrawl.dev/v2
βοΈ Usually does not need to be changed unless youβre using a self-hosted version or a proxy service.
TAVILY_SEARCH_DEPTHConfigure the result depth for Tavily searches.
TAVILY_SEARCH_DEPTH=basic
Supported values:
basic: Fast search, returns brief results;advanced: Deep search, returns more context and web page details.TAVILY_EXTRACT_DEPTHConfigure how deeply Tavily extracts content from web pages.
TAVILY_EXTRACT_DEPTH=basic
Supported values:
basic: Extracts basic info like title and content summary;advanced: Extracts structured data, lists, charts, and more from web pages.SEARXNG_URLThe URL of the SearXNG instance, which is a necessary configuration to enable the online search functionality. For example:
SEARXNG_URL=https://searxng-instance.com
This URL should point to a functional SearXNG instance. You can choose to self-host SearXNG or use a publicly available SearXNG instance.
You can find publicly available SearXNG instances in the SearXNG instance list. Choose an instance that is fast and reliable, and then configure its URL in LobeHub.
Note that the
searxngyou use must havejsonoutput enabled; otherwise, thelobehubcall will result in an error. If self-hosting, find thesearxngconfiguration file and addjsonas shown below.
$ vi searxng/settings.yml
...
search:
formats:
- html
- json
After configuration, you can verify whether the online search functionality is working correctly by following these steps:
If AI can answer these time-sensitive questions, it indicates that the online search functionality has been successfully configured.