devel-common/src/sphinx_exts/pagefind_search/README.md
A Sphinx extension providing fast, self-hosted search for Apache Airflow documentation using Pagefind.
Cmd+K (Mac) or Ctrl+K (Windows/Linux)In Sphinx's conf.py:
# Enable/disable search (default: True)
pagefind_enabled = True
# Verbose logging (default: False)
pagefind_verbose = False
# Content selector (default: "main")
pagefind_root_selector = "main"
# Exclude selectors (default: see below)
# These elements won't be included in the search index
pagefind_exclude_selectors = [
".headerlink", # Permalink icons
".toctree-wrapper", # Table of contents navigation
"nav", # All navigation elements
"footer", # Footer content
".td-sidebar", # Left sidebar
".breadcrumb", # Breadcrumb navigation
".navbar", # Top navigation bar
".dropdown-menu", # Dropdown menus (version selector, etc.)
".docs-version-selector", # Version selector widget
"[role='navigation']", # ARIA navigation landmarks
".d-print-none", # Print-hidden elements (usually UI controls)
".pagefind-search-button", # Search button itself
]
# File pattern (default: "**/*.html")
pagefind_glob = "**/*.html"
# Exclude patterns (default: [])
# Path patterns to exclude from indexing (e.g., auto-generated API docs)
# Note: File-by-file indexing is used when patterns are specified (slightly slower but precise)
# Pagefind does NOT automatically exclude underscore-prefixed directories
pagefind_exclude_patterns = [
"_api/**", # Exclude API documentation
"_modules/**", # Exclude source code modules
"release_notes.html", # Exclude specific files
"genindex.html", # Exclude generated index
]
# Content weighting (default: True)
# Uses lightweight regex to add data-pagefind-weight attributes to titles and headings
pagefind_content_weighting = True
# Enable playground (default: False)
# Creates a playground at /_pagefind/playground/ for debugging search
pagefind_enable_playground = False
# Custom records for non-HTML content (default: [])
pagefind_custom_records = [
{
"url": "/downloads/guide.pdf",
"content": "PDF content...",
"language": "en",
"meta": {"title": "Guide PDF"},
}
]
The extension uses optimized ranking parameters in search.js:
Combined with content weighting (10x for titles, 9x for h1, 7x for h2), these settings ensure exact title matches rank highly even for very long pages.
_static/python -c "import pagefind; print('OK')"ls generated/_build/docs/apache-airflow/stable/_pagefind/pagefind_verbose = True in conf.pypagefind_enabled = True in conf.pypagefind[bin] unavailable, use: npx pagefind --site <build-dir>pagefind_enable_playground = True in conf.py/_pagefind/playground/pagefind_content_weighting = True (default)The playground provides detailed insights:
Example, access at: http://localhost:8000/docs/apache-airflow/stable/_pagefind/playground/