tools/distro-scraper/README.md
A Python package for scraping cloud distribution image information for Multipass.
This tool fetches metadata about the latest cloud images from various Linux distributions and outputs a JSON file for use by Multipass.
The scraper will:
Create a virtual environment:
python -m venv .venv
source .venv/bin/activate
Install the package in development mode:
pip install -e .
After installation, run the scraper by providing an output file path:
distro-scraper <output_file>
python -m scraper <output_file>
The scraper uses Python entry points for extensibility. Each distribution scraper is registered as a plugin in pyproject.toml:
[project.entry-points."dist_scraper.scrapers"]
debian = "scraper.scrapers.debian:DebianScraper"
fedora = "scraper.scrapers.fedora:FedoraScraper"
scraper/scrapers/ (e.g., scraper/scrapers/ubuntu.py)BaseScraper abstract class:from ..base import BaseScraper
class UbuntuScraper(BaseScraper):
def __init__(self):
super().__init__()
@property
def name(self) -> str:
return "Ubuntu"
async def fetch(self) -> dict:
# Fetch and return distribution data
return {
"aliases": "ubuntu",
"os": "Ubuntu",
"release": "24.04",
"release_codename": "Noble Numbat",
"release_title": "24.04",
"items": {
"x86_64": {...},
"arm64": {...}
}
}
pyproject.toml:[project.entry-points."dist_scraper.scrapers"]
ubuntu = "scraper.scrapers.ubuntu:UbuntuScraper"
pip install -e .