Configuring Online Search Functionality

LobeHub supports configuring web search functionality for AI, enabling it to retrieve real-time information from the internet to provide more accurate and up-to-date responses. Web search supports multiple search engine providers, including SearXNG, Search1API, Google, and Brave, among others.

<Callout type="info"> Web search allows AI to access time-sensitive content, such as the latest news, technology trends, or product information. You can deploy the open-source SearXNG yourself, or choose to integrate mainstream search services like Search1API, Google, Brave, etc., combining them freely based on your use case. </Callout>

By setting the search service environment variable SEARCH_PROVIDERS and the corresponding API Keys, LobeHub will query multiple sources and return the results. You can also configure crawler service environment variables such as CRAWLER_IMPLS (e.g., browserless, firecrawl, tavily, etc.) to extract webpage content, enhancing the capability of search + reading.

Core Environment Variables

`CRAWLER_IMPLS`

Configure available web crawlers for structured extraction of webpage content.

env

CRAWLER_IMPLS="naive,search1api"

Supported crawler types are listed below:

Value	Description	Environment Variable
`browserless`	Headless browser crawler based on Browserless, suitable for rendering complex pages.	`BROWSERLESS_TOKEN`
`exa`	Crawler capabilities provided by Exa, API required.	`EXA_API_KEY`
`firecrawl`	Firecrawl headless browser API, ideal for modern websites.	`FIRECRAWL_API_KEY`
`jina`	Crawler service from Jina AI, supports fast content summarization.	`JINA_READER_API_KEY`
`naive`	Built-in general-purpose crawler for standard web structures.
`search1api`	Page crawling capabilities from Search1API, great for structured content extraction.	`SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY`
`tavily`	Web scraping and summarization API from Tavily.	`TAVILY_API_KEY`

💡 Setting multiple crawlers increases success rate; the system will try different ones based on priority.

`CRAWL_CONCURRENCY`

Controls crawler concurrency per crawl task. The default is 3. On low-resource servers, use 1 to reduce CPU spikes.

env

CRAWL_CONCURRENCY=3

`CRAWLER_RETRY`

Controls retry attempts per URL on crawl failures. The default is 1 (up to 2 attempts total).

env

CRAWLER_RETRY=1

`SEARCH_PROVIDERS`

Configure which search engine providers to use for web search.

env

SEARCH_PROVIDERS="searxng"

Supported search engines include:

Value	Description	Environment Variable
`anspire`	Search service provided by Anspire.	`ANSPIRE_API_KEY`
`bocha`	Search service from Bocha.	`BOCHA_API_KEY`
`brave`	Brave, a privacy-friendly search source.	`BRAVE_API_KEY`
`exa`	Exa, a search API designed for AI.	`EXA_API_KEY`
`firecrawl`	Search capabilities via Firecrawl.	`FIRECRAWL_API_KEY`
`google`	Uses Google Programmable Search Engine.	`GOOGLE_PSE_API_KEY` `GOOGLE_PSE_ENGINE_ID`
`jina`	Semantic search provided by Jina AI.	`JINA_READER_API_KEY`
`kagi`	Premium search API by Kagi, requires a subscription key.	`KAGI_API_KEY`
`search1api`	Aggregated search capabilities from Search1API.	`SEARCH1API_API_KEY` `SEARCH1API_CRAWL_API_KEY` `SEARCH1API_SEARCH_API_KEY`
`searxng`	Use a self-hosted or public SearXNG instance.	`SEARXNG_URL`
`tavily`	Tavily, offers fast web summaries and answers.	`TAVILY_API_KEY`

⚠️ Some search providers require you to apply for an API Key and configure it in your .env file.

`BROWSERLESS_URL`

Specifies the API endpoint for Browserless, used for web crawling tasks. Browserless is a browser automation platform based on Headless Chrome, ideal for rendering dynamic pages.

env

BROWSERLESS_URL=https://chrome.browserless.io

📌 Usually used together with CRAWLER_IMPLS=browserless.

`BROWSERLESS_BLOCK_ADS`

Enables ad blocking functionality. When using Browserless for web scraping, it automatically blocks common ad resources (such as scripts, images, trackers, etc.), improving scraping speed and page clarity.

env

BROWSERLESS_BLOCK_ADS=1

📌 Supported values:

1: Enable ad blocking (recommended);

0: Disable ad blocking (default).

✅ It is recommended to use with BROWSERLESS_STEALTH_MODE=1 to enhance stealth and scraping success rate.

`BROWSERLESS_STEALTH_MODE`

Enables stealth mode. When using Browserless for web scraping, it applies various anti-detection techniques (such as modifying the user agent, removing webdriver traits, simulating user interactions) to bypass anti-bot mechanisms.

env

BROWSERLESS_STEALTH_MODE=1

📌 Supported values:

1: Enable stealth mode (recommended);

0: Disable stealth mode (default).

⚠️ Some websites use advanced anti-scraping techniques. Enabling stealth mode can significantly improve scraping success rate.

`GOOGLE_PSE_ENGINE_ID`

Configure the Search Engine ID for Google Programmable Search Engine (Google PSE), used to restrict the search scope. Must be used alongside GOOGLE_PSE_API_KEY.

env

GOOGLE_PSE_ENGINE_ID=your-google-cx-id

🔑 How to get it: Visit programmablesearchengine.google.com, create a search engine, and obtain the cx parameter.

`FIRECRAWL_URL`

Sets the access URL for the Firecrawl API, used for web content scraping. Default value:

env

FIRECRAWL_URL=https://api.firecrawl.dev/v2

⚙️ Usually does not need to be changed unless you’re using a self-hosted version or a proxy service.

`TAVILY_SEARCH_DEPTH`

Configure the result depth for Tavily searches.

env

TAVILY_SEARCH_DEPTH=basic

Supported values:

basic: Fast search, returns brief results;
advanced: Deep search, returns more context and web page details.

`TAVILY_EXTRACT_DEPTH`

Configure how deeply Tavily extracts content from web pages.

env

TAVILY_EXTRACT_DEPTH=basic

Supported values:

basic: Extracts basic info like title and content summary;
advanced: Extracts structured data, lists, charts, and more from web pages.

`SEARXNG_URL`

The URL of the SearXNG instance, which is a necessary configuration to enable the online search functionality. For example:

shell

SEARXNG_URL=https://searxng-instance.com

This URL should point to a functional SearXNG instance. You can choose to self-host SearXNG or use a publicly available SearXNG instance.

You can find publicly available SearXNG instances in the SearXNG instance list. Choose an instance that is fast and reliable, and then configure its URL in LobeHub.

Note that the searxng you use must have json output enabled; otherwise, the lobehub call will result in an error. If self-hosting, find the searxng configuration file and add json as shown below.

bash

$ vi searxng/settings.yml
...
search:
formats:
- html
- json

Verifying Online Search Functionality

After configuration, you can verify whether the online search functionality is working correctly by following these steps:

Restart the LobeHub service.
Start a new chat session, enable smart online search, and then ask AI a question that requires the latest information, such as "What is the current gold price today?" or "What are the latest major news stories?"
Observe whether AI can return the latest information based on internet searches.

If AI can answer these time-sensitive questions, it indicates that the online search functionality has been successfully configured.

>-