scripts/README.md
Converts Hugo-generated HTML files to fully-rendered Markdown with evaluated shortcodes, dereferenced shared content, and removed comments.
This script generates production-ready Markdown output for LLM consumption and user downloads. The generated Markdown:
{{% product-name %}} → "InfluxDB 3 Core")# Generate all markdown files (run after Hugo build)
yarn build:md
# Generate with verbose logging
yarn build:md:verbose
# Generate for specific path
node scripts/html-to-markdown.js --path influxdb3/core
# Generate limited number for testing
node scripts/html-to-markdown.js --limit 10
# Combine options
node scripts/html-to-markdown.js --path telegraf/v1 --verbose
--path <path>: Process specific path within public/ (default: process all)--limit <n>: Limit number of files to process (useful for testing)--verbose: Enable detailed logging of conversion progressHugo generates HTML (with all shortcodes evaluated):
npx hugo --quiet
Script converts HTML to Markdown:
yarn build:md
Generated files:
public/**/index.md (alongside index.html)public/ directory is gitignored)Automatically detects and adds product information to frontmatter:
---
title: Set up InfluxDB 3 Core
description: Install, configure, and set up authorization...
url: /influxdb3/core/get-started/setup/
product: InfluxDB 3 Core
product_version: core
date: 2025-11-13
lastmod: 2025-11-13
---
Supported products:
Custom Turndown rules for InfluxData documentation:
> [!Note] formatExtracts only article content (removes navigation, footer, etc.):
article.article--contentLocal Development:
# After making content changes
npx hugo --quiet && yarn build:md
CircleCI Build Pipeline:
The script runs automatically in the CircleCI build pipeline after Hugo generates HTML:
# .circleci/config.yml
- run:
name: Hugo Build
command: yarn hugo --environment production --logLevel info --gc --destination workspace/public
- run:
name: Generate LLM-friendly Markdown
command: node scripts/html-to-markdown.js
Build order:
workspace/public/**/*.htmlhtml-to-markdown.js converts HTML → workspace/public/**/*.mdProduction Build (Manual):
npx hugo --quiet
yarn build:md
Watch Mode: For development with auto-regeneration, run Hugo server and regenerate markdown after content changes:
# Terminal 1: Hugo server
npx hugo server
# Terminal 2: After making changes
yarn build:md
No article content found:
⚠️ No article content found in /path/to/file.html
article.article--content selectorShortcodes still present:
Missing product context:
PRODUCT_MAP