Back to Crawl4ai

Browser

docs/codebase/browser.md

0.8.64.9 KB
Original Source

browser_manager.py

FunctionWhat it does
ManagedBrowser.build_browser_flagsReturns baseline Chromium CLI flags, disables GPU and sandbox, plugs locale, timezone, stealth tweaks, and any extras from BrowserConfig.
ManagedBrowser.__init__Stores config and logger, creates temp dir, preps internal state.
ManagedBrowser.startSpawns or connects to the Chromium process, returns its CDP endpoint plus the subprocess.Popen handle.
ManagedBrowser._initial_startup_checkPings the CDP endpoint once to be sure the browser is alive, raises if not.
ManagedBrowser._monitor_browser_processAsync-loops on the subprocess, logs exits or crashes, restarts if policy allows.
ManagedBrowser._get_browser_path_WIPOld helper that maps OS + browser type to an executable path.
ManagedBrowser._get_browser_pathCurrent helper, checks env vars, Playwright cache, and OS defaults for the real executable.
ManagedBrowser._get_browser_argsBuilds the final CLI arg list by merging user flags, stealth flags, and defaults.
ManagedBrowser.cleanupTerminates the browser, stops monitors, deletes the temp dir.
ManagedBrowser.create_profileOpens a visible browser so a human can log in, then zips the resulting user-data-dir to ~/.crawl4ai/profiles/<name>.
ManagedBrowser.list_profilesThin wrapper, now forwarded to BrowserProfiler.list_profiles().
ManagedBrowser.delete_profileThin wrapper, now forwarded to BrowserProfiler.delete_profile().
BrowserManager.__init__Holds the global Playwright instance, browser handle, config signature cache, session map, and logger.
BrowserManager.startBoots the underlying ManagedBrowser, then spins up the default Playwright browser context with stealth patches.
BrowserManager._build_browser_argsTranslates CrawlerRunConfig (proxy, UA, timezone, headless flag, etc.) into Playwright launch_args.
BrowserManager.setup_contextApplies locale, geolocation, permissions, cookies, and UA overrides on a fresh context.
BrowserManager.create_browser_contextInternal helper that actually calls browser.new_context(**options) after running setup_context.
BrowserManager._make_config_signatureHashes the non-ephemeral parts of CrawlerRunConfig so contexts can be reused safely.
BrowserManager.get_pageReturns a ready Page for a given session id, reusing an existing one or creating a new context/page, injects helper scripts, updates last_used.
BrowserManager.kill_sessionForce-closes a context/page for a session and removes it from the session map.
BrowserManager._cleanup_expired_sessionsPeriodic sweep that drops sessions idle longer than ttl_seconds.
BrowserManager.closeGracefully shuts down all contexts, the browser, Playwright, and background tasks.

browser_profiler.py

FunctionWhat it does
BrowserProfiler.__init__Sets up profile folder paths, async logger, and signal handlers.
BrowserProfiler.create_profileLaunches a visible browser with a new user-data-dir for manual login, on exit compresses and stores it as a named profile.
BrowserProfiler.cleanup_handlerGeneral SIGTERM/SIGINT cleanup wrapper that kills child processes.
BrowserProfiler.sigint_handlerHandles Ctrl-C during an interactive session, makes sure the browser shuts down cleanly.
BrowserProfiler.listen_for_quit_commandAsync REPL that exits when the user types q.
BrowserProfiler.list_profilesEnumerates ~/.crawl4ai/profiles, prints profile name, browser type, size, and last modified.
BrowserProfiler.get_profile_pathReturns the absolute path of a profile given its name, or None if missing.
BrowserProfiler.delete_profileRemoves a profile folder or a direct path from disk, with optional confirmation prompt.
BrowserProfiler.interactive_managerText UI loop for listing, creating, deleting, or launching profiles.
BrowserProfiler.launch_standalone_browserStarts a non-headless Chromium with remote debugging enabled and keeps it alive for manual tests.
BrowserProfiler.get_cdp_jsonPulls /json/version from a CDP endpoint and returns the parsed JSON.
BrowserProfiler.launch_builtin_browserSpawns a headless Chromium in the background, saves {wsEndpoint, pid, started_at} to ~/.crawl4ai/builtin_browser.json.
BrowserProfiler.get_builtin_browser_infoReads that JSON file, verifies the PID, and returns browser status info.
BrowserProfiler._is_browser_runningCross-platform helper that checks if a PID is still alive.
BrowserProfiler.kill_builtin_browserTerminates the background builtin browser and removes its status file.
BrowserProfiler.get_builtin_browser_statusReturns {running: bool, wsEndpoint, pid, started_at} for quick health checks.

Let me know what you want to tweak or dive into next.