Back to Crawl4ai

Installation šŸ’»

docs/md_v2/basic/installation.md

0.8.63.6 KB
Original Source

Installation šŸ’»

Crawl4AI offers flexible installation options to suit various use cases. You can install it as a Python package, use it with Docker, or run it as a local server.

Crawl4AI is now available on PyPI, making installation easier than ever. Choose the option that best fits your needs:

Basic Installation

For basic web crawling and scraping tasks:

bash
pip install crawl4ai
playwright install # Install Playwright dependencies

Installation with PyTorch

For advanced text clustering (includes CosineSimilarity cluster strategy):

bash
pip install crawl4ai[torch]

Installation with Transformers

For text summarization and Hugging Face models:

bash
pip install crawl4ai[transformer]

Full Installation

For all features:

bash
pip install crawl4ai[all]

Development Installation

For contributors who plan to modify the source code:

bash
git clone https://github.com/unclecode/crawl4ai.git
cd crawl4ai
pip install -e ".[all]"
playwright install # Install Playwright dependencies

šŸ’” After installation with "torch", "transformer", or "all" options, it's recommended to run the following CLI command to load the required models:

bash
crawl4ai-download-models

This is optional but will boost the performance and speed of the crawler. You only need to do this once after installation.

Playwright Installation Note for Ubuntu

If you encounter issues with Playwright installation on Ubuntu, you may need to install additional dependencies:

bash
sudo apt-get install -y \
    libwoff1 \
    libopus0 \
    libwebp7 \
    libwebpdemux2 \
    libenchant-2-2 \
    libgudev-1.0-0 \
    libsecret-1-0 \
    libhyphen0 \
    libgdk-pixbuf2.0-0 \
    libegl1 \
    libnotify4 \
    libxslt1.1 \
    libevent-2.1-7 \
    libgles2 \
    libxcomposite1 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libepoxy0 \
    libgtk-3-0 \
    libharfbuzz-icu0 \
    libgstreamer-gl1.0-0 \
    libgstreamer-plugins-bad1.0-0 \
    gstreamer1.0-plugins-good \
    gstreamer1.0-plugins-bad \
    libxt6 \
    libxaw7 \
    xvfb \
    fonts-noto-color-emoji \
    libfontconfig \
    libfreetype6 \
    xfonts-cyrillic \
    xfonts-scalable \
    fonts-liberation \
    fonts-ipafont-gothic \
    fonts-wqy-zenhei \
    fonts-tlwg-loma-otf \
    fonts-freefont-ttf

Option 2: Using Docker (Coming Soon)

Docker support for Crawl4AI is currently in progress and will be available soon. This will allow you to run Crawl4AI in a containerized environment, ensuring consistency across different systems.

Option 3: Local Server Installation

For those who prefer to run Crawl4AI as a local server, instructions will be provided once the Docker implementation is complete.

Verifying Your Installation

After installation, you can verify that Crawl4AI is working correctly by running a simple Python script:

python
import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(url="https://www.example.com")
        print(result.markdown[:500])  # Print first 500 characters

if __name__ == "__main__":
    asyncio.run(main())

This script should successfully crawl the example website and print the first 500 characters of the extracted content.

Getting Help

If you encounter any issues during installation or usage, please check the documentation or raise an issue on the GitHub repository.

Happy crawling! šŸ•·ļøšŸ¤–