Back to Crawl4ai

Crawl4AI Assistant

docs/md_v2/apps/crawl4ai-assistant/index.html

0.8.610.5 KB
Original Source

Video

You don't need Puppeteer. You need Crawl4AI Cloud.

One API call. JS-rendered. No browser cluster to maintain.

Get API Key β†’

About Crawl4AI Assistant

Transform any website into structured data with just a few clicks! The Crawl4AI Assistant Chrome Extension provides three powerful tools for web scraping and data extraction.

πŸŽ‰ NEW: Click2Crawl extracts data INSTANTLY without any LLM! Test your schema and see JSON results immediately in the browser!

🎯

Click2Crawl

Visual data extraction - click elements to build schemas instantly!

πŸ”΄

Script Builder (Alpha)

Record browser actions to create automation scripts

πŸ“

Markdown Extraction (New!)

Convert any webpage content to clean markdown with Visual Text Mode

Quick Start

Installation

1

Download the Extension

Get the latest release from GitHub or use the button below

↓ Download Extension (v1.3.0)

2

Load in Chrome

Navigate to chrome://extensions/ and enable Developer Mode

3

Load Unpacked

Click "Load unpacked" and select the extracted extension folder

Explore Our Tools

🎯

Click2Crawl

Visual data extraction

Available

πŸ”΄

Script Builder

Browser automation

Alpha

πŸ“

Markdown Extraction

Content to markdown

New!

🎯 Click2Crawl

Click elements to build extraction schemas - No LLM needed!

1

Select Container

Click on any repeating element like product cards or articles. Use up/down navigation to fine-tune selection!

β–  Container highlighted in green

2

Click Fields to Extract

Click on data fields inside the container - choose text, links, images, or attributes

β–  Fields highlighted in pink

3

Test & Extract Data Instantly!

πŸŽ‰ Click "Test Schema" to see extracted JSON immediately - no LLM or coding required!

⚑ See extracted JSON immediately

πŸš€ Zero LLM dependency

πŸ“Š Instant JSON extraction

🎯 Visual element selection

🐍 Export Python code

✨ Live preview

πŸ“₯ Download results

πŸ“ Export to markdown

πŸ”΄ Script Builder

Record actions, generate automation

1

Hit Record

Start capturing your browser interactions

● Recording indicator

2

Interact Naturally

Click, type, scroll - everything is captured

πŸ–±οΈ ⌨️ πŸ“œ

3

Export Script

Get JavaScript for Crawl4AI's js_code parameter

πŸ“ Automation ready

Smart action grouping

Wait detection

Keyboard shortcuts

Alpha version

πŸ“ Markdown Extraction

Convert webpage content to clean markdown "as you see"

1

Ctrl/Cmd + Click

Hold Ctrl/Cmd and click multiple elements you want to extract

πŸ”’ Numbered selection badges

2

Enable Visual Text Mode

Extract content "as you see" - clean text without complex HTML structures

πŸ‘οΈ Visual Text Mode (As You See)

3

Export Clean Markdown

Get beautifully formatted markdown ready for documentation or LLMs

πŸ“„ Clean, readable output

Multi-select with Ctrl/Cmd

Visual Text Mode (As You See)

Clean markdown output

Export to Crawl4AI Cloud (soon)

See the Generated Code & Extracted Data

🎯 Click2CrawlπŸ”΄ Script BuilderπŸ“ Markdown Extraction

click2crawl_extraction.pyCopy

#!/usr/bin/env python3
"""
πŸŽ‰ NO LLM NEEDED! Direct extraction with CSS selectors
Generated by Crawl4AI Chrome Extension - Click2Crawl
"""

import asyncio
import json
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy

# The EXACT schema from Click2Crawl - no guessing!
EXTRACTION_SCHEMA = {
    "name": "Product Catalog",
    "baseSelector": "div.product-card", # The container you selected
    "fields": [
        {
            "name": "title",
            "selector": "h3.product-title",
            "type": "text"
        },
        {
            "name": "price",
            "selector": "span.price",
            "type": "text"
        },
        {
            "name": "image",
            "selector": "img.product-img",
            "type": "attribute",
            "attribute": "src"
        },
        {
            "name": "link",
            "selector": "a.product-link",
            "type": "attribute",
            "attribute": "href"
        }
    ]
}

async def extract_data(url: str):
    # Direct extraction - no LLM API calls!
    extraction_strategy = JsonCssExtractionStrategy(schema=EXTRACTION_SCHEMA)

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=url,
            config=CrawlerRunConfig(extraction_strategy=extraction_strategy)
        )

        if result.success:
            data = json.loads(result.extracted_content)
            print(f"βœ… Extracted {len(data)} items instantly!")

            # Save to file
            with open('products.json', 'w') as f:
                json.dump(data, f, indent=2)

            return data

# Run extraction on any similar page!
data = asyncio.run(extract_data("https://example.com/products"))

# 🎯 Result: Clean JSON data, no LLM costs, instant results!

extracted_data.jsonCopy

// πŸŽ‰ Instantly extracted from the page - no coding required!
[
  {
    "title": "Wireless Bluetooth Headphones",
    "price": "$79.99",
    "image": "https://example.com/images/headphones-bt-01.jpg",
    "link": "/products/wireless-bluetooth-headphones"
  },
  {
    "title": "Smart Watch Pro 2024",
    "price": "$299.00",
    "image": "https://example.com/images/smartwatch-pro.jpg",
    "link": "/products/smart-watch-pro-2024"
  },
  {
    "title": "4K Webcam for Streaming",
    "price": "$149.99",
    "image": "https://example.com/images/webcam-4k.jpg",
    "link": "/products/4k-webcam-streaming"
  },
  {
    "title": "Mechanical Gaming Keyboard RGB",
    "price": "$129.99",
    "image": "https://example.com/images/keyboard-gaming.jpg",
    "link": "/products/mechanical-gaming-keyboard"
  },
  {
    "title": "USB-C Hub 7-in-1",
    "price": "$45.99",
    "image": "https://example.com/images/usbc-hub.jpg",
    "link": "/products/usb-c-hub-7in1"
  }
]

automation_script.pyCopy

import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig

# JavaScript generated from your recorded actions
js_script = """
// Search for products
document.querySelector('button.search-toggle').click();
await new Promise(r => setTimeout(r, 500));

// Type search query
const searchInput = document.querySelector('input#search');
searchInput.value = 'wireless headphones';
searchInput.dispatchEvent(new Event('input', {bubbles: true}));

// Submit search
searchInput.dispatchEvent(new KeyboardEvent('keydown', {
    key: 'Enter', keyCode: 13, bubbles: true
}));

// Wait for results
await new Promise(r => setTimeout(r, 2000));

// Click first product
document.querySelector('.product-item:first-child').click();

// Wait for product page
await new Promise(r => setTimeout(r, 1000));

// Add to cart
document.querySelector('button.add-to-cart').click();
"""

async def automate_shopping():
    config = CrawlerRunConfig(
        js_code=js_script,
        wait_for="css:.cart-confirmation",
        screenshot=True
    )

    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url="https://shop.example.com",
            config=config
        )
        print(f"βœ“ Automation complete: {result.url}")
        return result

asyncio.run(automate_shopping())

extracted_content.mdCopy

# Extracted from Hacker News with Visual Text Mode πŸ‘οΈ

1. **Show HN: I built a tool to find and reach out to YouTubers** (hellosimply.io)
   84 points by erickim 2 hours ago | hide | 31 comments

2. **The 24 Hour Restaurant** (logicmag.io)
   124 points by helsinkiandrew 5 hours ago | hide | 52 comments

3. **Building a Better Bloom Filter in Rust** (carlmastrangelo.com)
   89 points by carlmastrangelo 3 hours ago | hide | 27 comments

---

### Article: The 24 Hour Restaurant

In New York City, the 24-hour restaurant is becoming extinct. What we lose when we can no longer eat whenever we want.

When I first moved to New York, I loved that I could get a full meal at 3 AM. Not just pizza or fast food, but a proper sit-down dinner with table service and a menu that ran for pages. The city that never sleeps had restaurants that matched its rhythm.

Today, finding a 24-hour restaurant in Manhattan requires genuine effort. The pandemic accelerated a decline that was already underway, but the roots go deeper: rising rents, changing labor laws, and shifting cultural patterns have all contributed to the death of round-the-clock dining.

---

### Product Review: Framework Laptop 16

**Specifications:**
- Display: 16" 2560Γ—1600 165Hz
- Processor: AMD Ryzen 7 7840HS
- Memory: 32GB DDR5-5600
- Storage: 2TB NVMe Gen4
- Price: Starting at $1,399

**Pros:**
- Fully modular and repairable
- Excellent Linux support
- Great keyboard and trackpad
- Expansion card system

**Cons:**
- Battery life could be better
- Slightly heavier than competitors
- Fan noise under load

Crawl4AI Cloud

Your browser cluster without the cluster.

⚑ POST /crawl

🌐 JS-rendered pages

πŸ“Š Schema extraction built-in

πŸ’° $0.001/page

Get Early Access β†’

See it extract your own data. Right now.

Γ—

πŸš€ Join C4AI Cloud Waiting List

Be among the first to experience the future of web scraping

Your Name

Email Address

Company (Optional)

What will you use Crawl4AI Cloud for?Select use case...Price MonitoringNews AggregationMarket ResearchAI Training DataOther 🎯 Submit & Watch the Magic

Crawl4AI Cloud Demo

$ crawl4ai cloud extract --url "signup-form" --auto-detect

πŸ“Š Extracted Data

βœ…

Data Uploaded Successfully!

You're on the Crawl4AI Cloud waiting list!

What you just witnessed:

  • ⚑ Real-time extraction of your form data
  • πŸ”„ Automatic schema detection
  • πŸ“€ Instant cloud processing
  • ✨ No code required - just like that!

We'll notify you at when Crawl4AI Cloud launches!

Continue Exploring

More Features Coming Soon

Roadmap

We're continuously expanding C4AI Assistant with powerful new features:

Direct

Direct Data Download

Skip the code generation entirely! Download extracted data directly from Click2Crawl as JSON or CSV files.

πŸ“Š One-click download β€’ No Python needed β€’ Multiple export formats

AI

Smart Field Detection

AI-powered field detection for Click2Crawl that automatically suggests the most likely data fields on any page.

πŸ€– Auto-detect fields β€’ Smart naming β€’ Pattern recognition

πŸš€ Stay tuned for updates! Follow our GitHub for the latest releases.