book/tools/scripts/docs/README.md
This directory contains various Python scripts used for book maintenance and processing.
The improve_figure_captions.py script provides automated caption enhancement using local Ollama LLM models:
# Improve all captions (recommended)
python3 scripts/improve_figure_captions.py -d contents/core/
# Analysis and utilities
python3 scripts/improve_figure_captions.py --analyze -d contents/core/
python3 scripts/improve_figure_captions.py --build-map -d contents/core/
š Full documentation: See FIGURE_CAPTIONS.md for complete usage guide, model selection, and troubleshooting.
The cross_refs/ directory contains scripts for generating AI-powered cross-references with explanations.
š Full documentation: See cross_refs/RECIPE.md for complete workflow.
All Python dependencies are managed through the root-level requirements.txt file. This ensures consistent package versions across all scripts and the GitHub Actions workflow.
When adding new Python scripts that require external packages:
requirements.txt at the project root>=1.0.0)pip install -r requirements.txtThe current dependencies include:
jupyterlab-quarto, jupyternltk (with stopwords and punkt data)openai, gradiopybtex, pypandoc, pyyamlPillowjsonschemaabsl-pySome subdirectories have their own requirements.txt files for specific workflows:
scripts/publish/requirements.txt - Publishing dependenciesThese are kept for reference but the main workflow uses the root requirements.txt.
The GitHub Actions workflow automatically:
requirements.txtCache is invalidated when requirements.txt changes, ensuring dependencies stay up-to-date.
The project uses pre-commit hooks for code quality checks. The hooks run automatically on commit and include:
_quarto-html.yml and _quarto-pdf.ymlInstall pre-commit (included in requirements.txt):
pip install -r requirements.txt
Install the git hooks:
pre-commit install
Run manually (optional):
# Run on all files
pre-commit run --all-files
# Run on specific files
pre-commit run --files path/to/file.qmd
NLTK data issues: The hooks automatically download required NLTK data, but if you encounter issues, you can manually run:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
Python environment: The hooks use isolated Python environments with the specified dependencies, so they should work regardless of your local Python setup.