Implementation Notes: Metadata Features

Overview

This document captures lessons learned from implementing auto-generated metadata features for Redis documentation pages, including:

Table of Contents (TOC) metadata
Per-language identifiers and code examples
Per-codetabs metadata with language/client mappings
Metadata deduplication with location tracking

These insights should help guide future metadata feature implementations.

Key Lessons

1. Start with Hugo's Built-in Functions

Lesson: Always check what Hugo provides before building custom solutions.

Context: Initial attempts tried to manually extract headers from page content using custom partials. This was complex, error-prone, and required parsing HTML/Markdown.

Solution: Hugo's .TableOfContents method already generates HTML TOC from page headings. Using this as the source was much simpler and more reliable.

Takeaway: For future metadata features, audit Hugo's built-in methods first. They often solve 80% of the problem with minimal code.

2. Regex Substitution for Format Conversion

Lesson: Simple regex transformations can convert between formats more reliably than complex parsing.

Context: Converting HTML to JSON seemed like it would require a full HTML parser or complex state machine.

Solution: Breaking the conversion into small, sequential regex steps:

Remove wrapper elements (<nav>, </nav>)
Replace structural tags (<ul> → [, </ul> → ])
Replace content tags (<li><a href="#ID">TITLE</a> → {"id":"ID","title":"TITLE")
Add structural elements (commas, nested arrays)

Takeaway: For format conversions, think in terms of sequential substitution patterns rather than parsing. This is often simpler and more maintainable.

3. Hugo Template Whitespace Matters

Lesson: Hugo template whitespace and comments generate output that affects final formatting.

Context: Generated JSON had many blank lines, making it less readable.

Solution: Use Hugo's whitespace trimming markers ({{- and -}}) to prevent unwanted newlines.

Takeaway: When generating structured output (JSON, YAML), always consider whitespace. Test the final output, not just the template logic.

4. Markdown Templates Have Different Processing Rules

Lesson: Hugo's markdown template processor (.md files) behaves differently from HTML templates.

Context: Initial attempts to include metadata in markdown output failed because the template processor treated code blocks as boundaries.

Solution: Place metadata generation in the template itself, not in content blocks. Use safeHTML filter to prevent HTML entity escaping.

Takeaway: When targeting multiple output formats, test each format separately. Markdown templates have unique constraints that HTML templates don't have.

5. Validate Against Schema Early

Lesson: Create the schema before or immediately after implementation, not after.

Context: Schema was created last, after implementation was complete.

Better approach: Define the schema first, then implement to match it. This:

Clarifies the target structure
Enables validation during development
Provides documentation for implementers
Helps catch structural issues early

Takeaway: For future metadata features, write the schema first as a specification.

6. Centralize Configuration in `config.toml`

Lesson: Language and client identifiers should be centralized in configuration, not hardcoded in templates.

Context: When implementing per-language metadata, we initially considered hardcoding language/client mappings in templates. This would have been error-prone and difficult to maintain.

Solution: Created a centralized clientsConfig in config.toml with:

toml

[params.clientsConfig.Python]
langId = "python"
clientId = "redis-py"
clientName = "redis-py"

Then referenced this in templates via index $.Site.Params.clientsConfig $tabTitle.

Takeaway: For metadata that maps display names to stable identifiers, use config.toml as the single source of truth. This enables:

Easy updates without template changes
Consistency across all pages
Clear documentation of all supported languages/clients
Reusability across multiple templates

7. Use Data Attributes for Per-Element Metadata

Lesson: For metadata that applies to individual DOM elements (not page-level), use data attributes instead of separate metadata blocks.

Context: When implementing per-codetabs metadata, we could have created separate metadata blocks for each codetabs container. Instead, we used a data-codetabs-meta attribute on the container itself.

Solution: Store JSON metadata directly in data attributes:

html

<div class="codetabs" data-codetabs-meta='{"Python": {"language": "python", "client": "redis-py"}, ...}'>

Benefits:

Single source of truth per element
No duplication across panels
Easy runtime access via element.getAttribute()
Scales well with multiple instances on same page
Reduces overall page size vs. separate metadata blocks

Takeaway: For element-level metadata, prefer data attributes over separate metadata blocks. This is more efficient and easier to access at runtime.

8. Clarify Duplicate Metadata with Location Fields

Lesson: When metadata is duplicated in multiple locations, explicitly mark which is primary and which is fallback.

Context: We embed page metadata in both <head> (script tag) and <body> (hidden div) for redundancy. Without clear marking, downstream tools couldn't determine which to use.

Solution: Added two fields to every metadata instance:

location: "head" or "body" - indicates where this copy is located
duplicateOf: "head:data-ai-metadata" - references the primary copy (only in duplicates)

Benefits:

Eliminates confusion for downstream tooling
Enables smart caching (use head, skip body)
Supports fallback logic (if head unavailable, use body)
Documents precedence clearly
Minimal overhead (just 2 small fields)

Takeaway: When duplicating metadata for redundancy, always include location markers. This enables intelligent handling by tools and AI agents.

9. Document Metadata Precedence Explicitly

Lesson: When multiple metadata sources exist, document which takes precedence and why.

Context: With head and body metadata, per-codetabs metadata, and per-panel attributes, tools need to know which to use.

Solution: Added a "Metadata Precedence" section to documentation that clearly states:

Prefer head metadata (primary, efficient)
Use body as fallback (if head unavailable)
Check duplicateOf field (indicates duplicate)

Takeaway: Always document metadata precedence explicitly. This prevents tools from making incorrect assumptions and enables consistent behavior across different implementations.

10. Test Multiple Page Types

Lesson: Metadata features must work across different page types with different content.

Context: Implementation was tested on data types pages and command pages, which have different metadata fields.

Takeaway: Always test on at least 2-3 different page types to ensure the feature is robust and handles optional fields correctly.

11. Document Optional Metadata Fields Thoroughly

Lesson: Optional metadata fields require clear documentation about when to use them and what they mean.

Context: The buildsUpon field was added to code examples to indicate learning progression, but without clear guidance, content authors didn't know when to use it.

Solution: Created comprehensive documentation including:

Field definition in PAGE_METADATA_FORMAT.md
Usage guidance in tcedocs/README.md with patterns and best practices
AI agent guide explaining how to consume the metadata
Validation rules specifying constraints and error handling

Takeaway: For optional metadata fields, provide:

What it is: Clear definition and purpose
When to use it: Specific guidance on when the field should be present
How to use it: Examples and patterns
How to consume it: Guidance for downstream tools and AI agents
Validation rules: Constraints and error handling

12. Provide Multiple Documentation Layers

Lesson: Different audiences need different documentation.

Context: The buildsUpon feature needed documentation for:

Content authors (when/how to use it)
AI agents (how to consume and use the metadata)
Build system (validation rules and constraints)

Solution: Created separate documentation files:

PAGE_METADATA_FORMAT.md - Metadata structure and examples
tcedocs/README.md - Content author guide with patterns
BUILDSUPON_AI_AGENT_GUIDE.md - AI agent consumption patterns
BUILDSUPON_VALIDATION_RULES.md - Validation rules and constraints

Takeaway: For complex features, create multiple documentation files targeting different audiences:

Specification docs for implementers
User guides for content authors
Integration guides for downstream tools
Validation docs for build systems

Implementation Checklist for Future Metadata Features

When implementing new metadata features, follow this order:

Phase 1: Planning & Configuration

Identify the metadata scope
- Is this page-level or element-level metadata?
- Will it be duplicated across multiple locations?
- Does it need to map display names to stable identifiers?
Centralize configuration (if needed)
- Add mappings to config.toml under params
- Use consistent naming conventions
- Document all supported values
Define the schema (static/schemas/feature-name.json)
- Specify required and optional fields
- Use JSON Schema Draft 7
- Include examples
- If duplicating metadata, include location and duplicateOf fields

Phase 2: Documentation

Create documentation (for-ais-only/metadata_docs/FEATURE_NAME_FORMAT.md)
- Explain the purpose and structure
- Show examples for different scenarios
- Document embedding locations (HTML, Markdown, data attributes)
- If multiple metadata sources exist, document precedence clearly
- Include usage examples for downstream tools

Phase 3: Implementation

Implement the feature
- For page-level metadata: Create/modify Hugo partials
- For element-level metadata: Use data attributes on the element
- Test on multiple page types
- Verify output in both HTML and Markdown formats
Handle optional fields gracefully
- Use Hugo's if statements to only include fields when present
- Test on pages with and without optional metadata

Phase 4: Validation & Documentation

Validate the output
- Write validation scripts
- Test against the schema
- Check whitespace and formatting
- Verify on multiple page types
Document implementation notes
- Capture lessons learned
- Note any workarounds or gotchas
- Provide guidance for future similar features
- Update this file with new insights

Common Gotchas

Template & Output Issues

HTML entity escaping: Use safeHTML filter when outputting HTML/JSON in markdown templates
Whitespace in templates: Use {{- and -}} to trim whitespace
Nested structures: Test deeply nested content to ensure regex patterns handle all cases
Optional fields: Remember that not all pages have all metadata fields
Markdown vs HTML: Always test both output formats

Metadata Design Issues

Hardcoded identifiers: Don't hardcode language/client mappings in templates - use config.toml
Duplicate metadata confusion: Always include location and duplicateOf fields when duplicating metadata
Missing precedence documentation: Tools won't know which metadata to use without explicit precedence guidance
Element-level metadata in separate blocks: Use data attributes instead of separate metadata blocks for element-level metadata
Inconsistent naming: Use stable identifiers (langId, clientId) separate from display names (id, clientName)

Testing Issues

Single page type testing: Test on at least 2-3 different page types (command pages, guide pages, etc.)
Missing optional fields: Test pages that don't have all optional metadata fields
Large nested structures: Test with deeply nested content (e.g., multi-level TOC, many code examples)
Multiple instances: Test pages with multiple instances of the same metadata type

Complete Metadata Architecture

The Redis documentation now has a comprehensive, multi-layered metadata system:

Layer 1: Page-Level Metadata (Primary)

Location: <script type="application/json" data-ai-metadata> in <head>
Purpose: Static analysis, AI agent processing, schema validation
Content: Title, description, categories, TOC, code examples, command info
Fields: location: "head"

Layer 2: Page-Level Metadata (Fallback)

Location: <div hidden data-redis-metadata="page"> in <body>
Purpose: DOM-based extraction, fallback access, runtime scripts
Content: Identical to Layer 1
Fields: location: "body", duplicateOf: "head:data-ai-metadata"

Layer 3: Per-Codetabs Metadata

Location: data-codetabs-meta attribute on codetabs container
Purpose: Runtime language/client mapping, AI agent interaction
Content: Tab name → language/client/mode mapping
Benefits: Single source of truth, zero duplication, efficient access

Layer 4: Per-Panel Attributes

Location: Individual panel <div> elements
Purpose: Panel-specific configuration (BinderHub, display language)
Content: data-lang, data-binder-id, data-codetabs-id

Design Principles

Single Source of Truth: Each piece of information exists in exactly one authoritative location
Minimal Duplication: When duplication is necessary (head/body), mark it explicitly
Efficient Access: Use data attributes for element-level metadata, script tags for page-level
Clear Precedence: Document which metadata to use when multiple sources exist
Stable Identifiers: Separate display names from stable identifiers (langId vs. id)
Centralized Configuration: Use config.toml for mappings that apply across pages

Tools and Techniques

Hugo filters: replaceRE, jsonify, safeHTML
Configuration: config.toml for centralized mappings
Data attributes: HTML5 data-* attributes for element-level metadata
Validation: Python's jsonschema library for schema validation
Testing: Extract metadata from generated files and validate against schema
Debugging: Use grep and head to inspect generated output

Implementation Notes: Metadata Features

Implementation Notes: Metadata Features

Overview

Key Lessons

1. Start with Hugo's Built-in Functions

2. Regex Substitution for Format Conversion

3. Hugo Template Whitespace Matters

4. Markdown Templates Have Different Processing Rules

5. Validate Against Schema Early

6. Centralize Configuration in config.toml

7. Use Data Attributes for Per-Element Metadata

8. Clarify Duplicate Metadata with Location Fields

9. Document Metadata Precedence Explicitly

10. Test Multiple Page Types

11. Document Optional Metadata Fields Thoroughly

12. Provide Multiple Documentation Layers

Implementation Checklist for Future Metadata Features

Phase 1: Planning & Configuration

Phase 2: Documentation

Phase 3: Implementation

Phase 4: Validation & Documentation

Common Gotchas

Template & Output Issues

Metadata Design Issues

Testing Issues

Complete Metadata Architecture

Layer 1: Page-Level Metadata (Primary)

Layer 2: Page-Level Metadata (Fallback)

Layer 3: Per-Codetabs Metadata

Layer 4: Per-Panel Attributes

Design Principles

Tools and Techniques

6. Centralize Configuration in `config.toml`