Back to Crawl4ai

PR Review Todolist

.context/PR-TODOLIST.md

0.8.69.2 KB
Original Source

PR Review Todolist

Last updated: 2026-03-13 | Total open PRs: 6


Remaining Open PRs (6)

Bug Fixes (2)

PRAuthorDescriptionNotes
#1207moncapitaineFix streaming error handlingOld PR, likely needs rebase
#462jtanningbedFix: Add newline before pre codeblock start in html2text. 1-line fixVery old, may still apply

Docs/Maintenance (2)

PRAuthorDescriptionNotes
#1756VasiliyRadAdded AG2 community integration example and Quickstart pointerCommunity example
#1533unclecodeAdd Claude Code GitHub WorkflowOwner's PR, CI

Skipped (owner PRs)

PRAuthorDescription
#1533unclecodeAdd Claude Code GitHub Workflow
#1124unclecodeAdd VNC streaming support

Previously Closed PRs (won't merge)

PRAuthorDescriptionReason
#999loliwRegex-based filters for deep crawlingURLPatternFilter already supports regex
#1180kunalmanelkarCallbackURLFilter for deep crawlingBreaks sync apply() interface
#1425denrusioOpenRouter API supportlitellm handles openrouter/ natively
#1702YxmMythCSS background image extractionToo invasive for niche feature
#1707dillonledouxCrawl-delay from robots.txtToo complex for non-standard directive
#1729hoiExternal Redis supportDocker infra - maintainer territory
#1592Ahmed-Tawfik94CDP page leaks and race conditionsSuperseded by develop page lifecycle system

Previously Closed PRs (from old todolist)

PRAuthorOriginal DescriptionWhat happened
#1572Ahmed-Tawfik94Fix CDP setting with managed browserClosed
#1234AdarsHH30Fix TypeError when keep_data_attributes=FalseClosed
#1211Praneeth1-O-1Fix: safely create new page if no page existsClosed
#1200fischerdrBugfix browser manager session handlingClosed
#1106devxpainFix: Adapt to CrawlerMonitor constructor changeClosed
#1081JoorritFix deep crawl scorer logic was invertedClosed
#1065mccullyaFix: Update deprecated Groq modelsClosed
#1059Aaron2516Fix wrong proxy config type in proxy demo exampleClosed
#1058Aaron2516Fix dict-type proxy_config not handled properlyClosed
#983umerkhan95Fix memory leak and empty responses in streaming modeClosed
#948GeorgeVinceFix summarize_page.py exampleClosed
#1689mzyfreeDocker: optimize concurrency performanceClosed (contributor acknowledged)
#1706vikas-gits-goodFix arun_many not working with DeepCrawlStrategyClosed
#1683Vaccarini-LorenzoImplement double config for AdaptiveCrawlerClosed
#1674blentzAdd output pagination/control for MCP endpointsClosed
#1650KennyStrykerAdd support for Vertex AI in LLM Extraction StrategyClosed
#1580arpagonAdd Azure OpenAI configuration supportClosed
#1417NickMandylasAdd CDP headers support for remote browser authClosed
#1255itsskofficialFix JsonCssSelector to handle adjacent sibling CSS selectorsClosed
#1245mukul-atomicworkFeature: GitHub releases integrationClosed
#1238yerik515Fix ManagedBrowser constructor and Windows encoding issuesClosed
#1220dcieslak19973Allow OPENAI_BASE_URL for LLM base_urlClosed
#901gbe3hunnaCrawlResult model: add pydantic fields and descriptionsClosed
#800atomlongensure_ascii=False for json.dumpsClosed
#799atomlongAllow setting base_url for LLM extraction strategy in CLIClosed
#741atomlongAdd config option to control Content-Security-Policy headerClosed
#723alexandreolivesOptional close page after screenshotClosed
#681ksalleeJS execution should happen after waitingClosed
#416dar0xtAdd keep-aria-label-attribute optionClosed
#332nelzomalAdd remove_invisible_texts method to crawler strategyClosed
#312AndreaFrancisAdd save to HuggingFace supportClosed
#1488AkosLukacsFix syntax error in README JSON exampleClosed
#1483NiclasLindqvistUpdate README.md with latest docker imageClosed
#1416adityaagreFix missing bracket in README code blockClosed
#1272zhenjunMaFix get title bug in amazon exampleClosed
#1263vvanglroFix: consistent with sdk behaviorClosed
#1225albertkimFix docker deployment guide URLClosed
#1223dowithlessDocs: add links to other language versions of READMEClosed
#1159lbeziaudFix cleanup warning when no process on debug portClosed
#1098B-X-YDocs: fix outdated links to Docker guideClosed
#1093Aaron2516Docs: Fixed incorrect elapsed calculationClosed
#967prajjwalnagUpdate README.mdClosed
#671SteveAlphaVantageUpdate README.mdClosed
#605mochamadsatriaFix typo in docker-deployment.md filenameClosed
#335amanagarwal042Add Documentation for Monitoring with OpenTelemetryClosed
#1722YuriNachosAdd missing docstring to MCP md endpointMerged directly

Resolved This Session (batch 6)

PRAuthorDescriptionDate
#1834ntohidifix: remove shared LOCK contention in monitor to prevent pod deadlock (#1754)2026-03-13

Resolved (batch 5)

PRAuthorDescriptionDate
#1622Ahmed-Tawfik94fix: verify redirect targets in URL seeder2026-03-07
#1786Br1an67fix: wire mean_delay/max_range into dispatcher2026-03-07
#1796Br1an67fix: DOMParser in process_iframes2026-03-07
#1795Br1an67fix: require api_token for /token endpoint2026-03-07
#1798SohamKukretifix: deep-crawl streaming mirrors Python library2026-03-07
#1734pgoslatarachore: update GitHub Actions versions2026-03-07
#1290130347665feat: type-list pipeline in JSON extraction2026-03-07
#1668microHoffmanfeat: --json-ensure-ascii CLI flag2026-03-07

Resolved (batch 4)

PRAuthorDescriptionDate
#1494AkosLukacsdocs: fix docstring param name crawler_config -> config2026-03-07
#1715YuriNachosdocs: add missing CacheMode import in quickstart2026-03-07
#1716YuriNachosdocs: fix return types to RunManyReturn2026-03-07
#1308dominicxdocs: fix css_selector type from list to string2026-03-07
#1789Br1an67fix: UTF-8 encoding for CLI file output2026-03-07
#1793Br1an67fix: configurable link_preview_timeout in AdaptiveConfig2026-03-07
#1792Br1an67fix: wait_for_images on screenshot endpoint2026-03-07
#1794Br1an67fix: cross-platform terminal input in CrawlerMonitor2026-03-07
#1784Br1an67fix: UnicodeEncodeError in URL seeder + zero-width chars2026-03-07
#1730hoifix: add TTL expiry for Redis task data2026-03-07

Previously Resolved (batches 1-3)

PRAuthorDescriptionDate
#1805nightcitybladefix: prevent AdaptiveCrawler from crawling external domains2026-03-07
#1763Otman404fix: return in finally block silently suppressing exceptions2026-03-07
#1803SohamKukretifix: from_serializable_dict ignoring plain data dicts2026-03-07
#1804nightcitybladefeat: add score_threshold to BestFirstCrawlingStrategy2026-03-07
#1790Br1an67fix: handle nested brackets in LINK_PATTERN regex2026-03-07
#1787Br1an67fix: strip markdown fences in LLM JSON responses2026-03-07
#1782Br1an67fix: preserve class/id in cleaned_html2026-03-07
#1788Br1an67fix: guard against None LLM content2026-03-07
#1783Br1an67fix: strip port from domain in is_external_url2026-03-07
#1179phamngocquyfix: raw HTML URL token leak2026-03-07
#1694theredradfeat: add force viewport screenshot2026-02-01
#1746ChiragBellarafix: avoid Common Crawl calls for sitemap-only seeding2026-02-01
#1714YuriNachosfix: replace tf-playwright-stealth with playwright-stealth2026-02-01
#1721YuriNachosfix: respect base tag for relative link resolution2026-02-01
#1719YuriNachosfix: include GoogleSearchCrawler script.js in package2026-02-01
#1717YuriNachosfix: allow local embeddings by removing OpenAI fallback2026-02-01
#1667christian-oudardfix: deep-crawl CLI outputting only first page2026-02-01
#1296vladmandicfix: VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY2026-02-01
#1364nnxiongfix: script tag removal losing adjacent text2026-02-01
#1077RoyLeviLangwarefix: bs4 deprecation warning (text -> string)2026-02-01
#1281garylukyfix: proxy auth ERR_INVALID_AUTH_CREDENTIALS2026-02-01
#1463TristanDonzefeat: device_scale_factor for screenshot quality2026-02-06
#1435charlaiefeat: redirected_status_code in CrawlResult2026-02-06