third_party/readability/modded_src/README.md
This directory contains Chromium's locally modified version of Mozilla's
Readability.js and Readability-readerable.js. These scripts are used by
the DOM Distiller component to extract the main content of web pages for
Reader Mode.
Modifications to the upstream library should be done directly in these files.
Always ensure that regexes and common logic are kept in sync between
Readability.js and Readability-readerable.js.
This directory also provides run_readability.cjs, a lightweight command-line
script to test and debug the distillation logic against arbitrary HTML files
without needing a full Chromium build.
The runner requires jsdom to simulate a browser environment in Node.js. To
install it locally within this directory (without committing to the tree), run:
cd third_party/readability/modded_src
npm install --prefix . jsdom
Run the script using node (specifically .cjs for CommonJS compatibility).
Tip: Save your test page as input.html and distilled results as
output.html in this directory. These filenames are already ignored by git.
cd third_party/readability/modded_src
node run_readability.cjs input.html output.html
<input.html>: Path to the raw HTML file you wish to distill.[output.html]: (Optional) Path to save the distilled HTML content.
If omitted, content is printed to standard output.Readability's internal debug logs (which trace scoring and node removal) are
printed to stderr during execution. Capture them by redirecting output:
node run_readability.cjs input.html output.html 2> debug.log
This script is ideal for command-line AI agents (like Gemini CLI) to autonomously test distillation fixes:
run_readability.cjs input.html output.html 2> debug.log.debug.log to understand node scoring or regex matching.Readability.js and re-run to verify the fix.jsdom does not execute scripts. <noscript>
tags may be treated as active content.Readability.js. Downstream
Chromium post-processing steps are not applied.