SELF_HOST.md
Welcome to Firecrawl 🔥! Here are some instructions on how to get the project locally so you can run it on your own and contribute.
If you're contributing, note that the process is similar to other open-source repos, i.e., fork Firecrawl, make changes, run tests, PR.
If you have any questions or would like help getting on board, join our Discord community here for more information or submit an issue on Github here!
Self-hosting Firecrawl is particularly beneficial for organizations with stringent security policies that require data to remain within controlled environments. Here are some key reasons to consider self-hosting:
However, there are some limitations and additional responsibilities to be aware of:
.env file. This requires a deeper understanding of the technologies and might involve more setup time.Self-hosting Firecrawl is ideal for those who need full control over their scraping and data processing environments but comes with the trade-off of additional maintenance and configuration efforts.
Create an .env in the root directory using the template below.
.env:
# ===== Required ENVS ======
PORT=3002
HOST=0.0.0.0
# Note: PORT is used by both the main API server and worker liveness check endpoint
# To turn on DB authentication, you need to set up Supabase.
USE_DB_AUTHENTICATION=false
# ===== Optional ENVS ======
## === AI features (JSON format on scrape, /extract API) ===
# Provide your OpenAI API key here to enable AI features
# OPENAI_API_KEY=
# Experimental: Use Ollama
# OLLAMA_BASE_URL=http://localhost:11434/api
# MODEL_NAME=deepseek-r1:7b
# MODEL_EMBEDDING_NAME=nomic-embed-text
# Experimental: Use any OpenAI-compatible API
# OPENAI_BASE_URL=https://example.com/v1
# OPENAI_API_KEY=
## === Proxy ===
# PROXY_SERVER can be a full URL (e.g. http://0.1.2.3:1234) or just an IP and port combo (e.g. 0.1.2.3:1234)
# Do not uncomment PROXY_USERNAME and PROXY_PASSWORD if your proxy is unauthenticated
# PROXY_SERVER=
# PROXY_USERNAME=
# PROXY_PASSWORD=
## === /search API ===
# By default, the /search API will use Google search.
# You can specify a SearXNG server with the JSON format enabled, if you'd like to use that instead of direct Google.
# You can also customize the engines and categories parameters, but the defaults should also work just fine.
# SEARXNG_ENDPOINT=http://your.searxng.server
# SEARXNG_ENGINES=
# SEARXNG_CATEGORIES=
## === Other ===
# Supabase Setup (used to support DB authentication, advanced logging, etc.)
# SUPABASE_ANON_TOKEN=
# SUPABASE_URL=
# SUPABASE_SERVICE_TOKEN=
# Use if you've set up authentication and want to test with a real API key
# TEST_API_KEY=
# This key lets you access the queue admin panel. Change this if your deployment is publicly accessible.
BULL_AUTH_KEY=CHANGEME
# This is now autoconfigured by the docker-compose.yaml. You shouldn't need to set it.
# PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/scrape
# REDIS_URL=redis://redis:6379
# REDIS_RATE_LIMIT_URL=redis://redis:6379
## === PostgreSQL Database Configuration ===
# Configure PostgreSQL credentials. These should match the credentials used by the nuq-postgres container.
# If you change these, ensure all three are set consistently.
# POSTGRES_USER=firecrawl
# POSTGRES_PASSWORD=firecrawl_password
# POSTGRES_DB=firecrawl
# Set if you have a llamaparse key you'd like to use to parse pdfs
# LLAMAPARSE_API_KEY=
# Set if you'd like to send server health status messages to Slack
# SLACK_WEBHOOK_URL=
## === System Resource Configuration ===
# Maximum CPU usage threshold (0.0-1.0). Worker will reject new jobs when CPU usage exceeds this value.
# Default: 0.8 (80%)
# MAX_CPU=0.8
# Maximum RAM usage threshold (0.0-1.0). Worker will reject new jobs when memory usage exceeds this value.
# Default: 0.8 (80%)
# MAX_RAM=0.8
# Set if you'd like to allow local webhooks to be sent to your self-hosted instance
# ALLOW_LOCAL_WEBHOOKS=true
.env template are for local development only. When deploying to a server, set POSTGRES_USER, POSTGRES_PASSWORD, and POSTGRES_DB to secure values and ensure they match the database service configuration.docker-compose.yaml does not expose PostgreSQL to the host or the internet. Avoid adding a ports mapping for nuq-postgres unless you are restricting access with a firewall. To access the database for maintenance, prefer using docker compose exec nuq-postgres psql or a temporary, firewalled tunnel.BULL_AUTH_KEY to a strong secret, especially on any deployment reachable from untrusted networks.Build and run the Docker containers:
docker compose build
docker compose up
If you encounter an error, make sure you're using docker compose and not docker-compose.
This will run a local instance of Firecrawl which can be accessed at http://localhost:3002.
You should be able to see the Bull Queue Manager UI on http://localhost:3002/admin/CHANGEME/queues.
(Optional) Test the API
If you’d like to test the crawl endpoint, you can run this:
curl -X POST http://localhost:3002/v1/crawl \
-H 'Content-Type: application/json' \
-d '{
"url": "https://firecrawl.dev"
}'
This section provides solutions to common issues you might encounter while setting up or running your self-hosted instance of Firecrawl.
Note: When using Firecrawl SDKs with a self-hosted instance, API keys are optional. API keys are only required when connecting to the cloud service (api.firecrawl.dev).
Symptom:
[YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Attempted to access Supabase client when it's not configured.
[YYYY-MM-DDTHH:MM:SS.SSSz]ERROR - Error inserting scrape event: Error: Supabase client is not configured.
Explanation: This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances.
Symptom:
[YYYY-MM-DDTHH:MM:SS.SSSz]WARN - You're bypassing authentication
Explanation: This error occurs because the Supabase client setup is not completed. You should be able to scrape and crawl with no problems. Right now it's not possible to configure Supabase in self-hosted instances.
Symptom: Docker containers exit unexpectedly or fail to start.
Solution: Check the Docker logs for any error messages using the command:
docker logs [container_name]
Symptom: Errors related to connecting to Redis, such as timeouts or "Connection refused".
Solution:
docker-compose.yaml file (redis://redis:6379)Symptom: API requests to the Firecrawl instance timeout or return no response.
Solution:
By addressing these common issues, you can ensure a smoother setup and operation of your self-hosted Firecrawl instance.
Read the examples/kubernetes/cluster-install/README.md for instructions on how to install Firecrawl on a Kubernetes Cluster.
Read the examples/kubernetes/firecrawl-helm/README.md for instructions on how to install Firecrawl on a Kubernetes Cluster with Helm.