SPELL-CHECK.md
This document explains the spell-checking rules and tools used in the InfluxData documentation repository.
The docs-v2 repository uses two complementary spell-checking tools:
| Feature | Vale | Codespell |
|---|---|---|
| Purpose | Document spell checking | Code comment spell checking |
| Integration | Pre-commit hooks (Docker) | CI/CD pipeline |
| False Positives | Low (comprehensive filters) | Low (clear dictionary only) |
| Customization | YAML rules | INI config + dictionary lists |
| Performance | Moderate | Fast |
| True Positive Detection | Document-level | Code-level |
.ci/vale/styles/InfluxDataDocs/Spelling.ymlUnlike other documentation style checkers, this configuration intentionally includes code blocks (~code is NOT excluded). This is critical because:
Comments in examples - Users copy code blocks with comments:
# Download and verify the GPG key
curl https://repos.influxdata.com/influxdata-archive.key
Typos in such comments become part of user documentation/scripts.
Documentation strings - Code examples may include documentation:
def create_database(name):
"""This funtion creates a new database.""" # ← typo caught
pass
Inline comments - Shell script comments are checked:
#!/bin/bash
# Retrive configuration from server
influxctl config get
(?:_*[a-z]+(?:[A-Z][a-z0-9]*)+(?:[A-Z][a-zA-Z0-9]*)*|[a-z_][a-z0-9]*_[a-z0-9_]*)
Why: Prevents false positives on variable/method names while NOT matching normal prose
Breakdown:
_*[a-z]+(?:[A-Z][a-z0-9]*)+(?:[A-Z][a-zA-Z0-9]*)*
myVariable from provide)_privateVar, __dunder__)[a-z_][a-z0-9]*_[a-z0-9_]*
my_variable from normal wordsExamples Ignored: myVariable, targetField, getCwd, _privateVar, my_variable, terminationGracePeriodSeconds
Examples NOT Ignored (caught by spell-checker): provide, database, variable (normal prose)
[A-Z_][A-Z0-9_]+
Why: Prevents false positives on environment variables and constants
Examples Ignored: API_KEY, AWS_REGION, INFLUXDB_TOKEN
Note: Matches AWS, API (even single uppercase acronyms) - acceptable in docs
\d+\.\d+(?:\.\d+)*
Why: Version numbers aren't words
Examples Ignored: 1.0, 2.3.1, 0.101.0, 1.2.3.4, v1.2.3
Note: Handles any number of version parts (2-part, 3-part, 4-part, etc.)
0[xX][0-9a-fA-F]+
Why: Hex values appear in code and aren't dictionary words
Examples Ignored: 0xFF, 0xDEADBEEF, 0x1A
/[a-zA-Z0-9/_\-\.\{\}]+ # Paths: /api/v2/write
https?://[^\s\)\]>"]+ # Full URLs: https://docs.example.com
Why: URLs contain hyphens, slashes, and special chars
Examples Ignored: /api/v2/write, /kapacitor/v1/, https://docs.influxdata.com
(?:endpoint|method|url|href|src|path)="[^"]+"
Why: Hugo shortcode attribute values often contain hyphens and special chars
Examples Ignored: endpoint="https://...", method="POST"
Future Enhancement: Add more attributes as needed (name, value, data, etc.)
[@#$%^&*()_+=\[\]{};:,.<>?/\\|-]+
Why: Symbols and special characters aren't words
Examples Ignored: (), {}, [], ->, =>, |, etc.
The configuration references two word lists:
InfluxDataDocs/Terms/ignore.txt - Product and technical terms (non-English)InfluxDataDocs/Terms/query-functions.txt - InfluxQL/Flux function namesTo add a word that should be ignored, edit the appropriate file.
.codespellrcWhy "clear" (not "rare" or "code"):
clear - Unambiguous spelling errors only
rare - Includes uncommon but valid English words
code - Includes code-specific words
skip = public,node_modules,dist,.git,.vale,api-docs
public - Generated HTML (not source)node_modules - npm dependencies (not our code)dist - Compiled TypeScript output (not source).git - Repository metadata.vale - Vale configuration and cacheapi-docs - Generated OpenAPI specifications (many false positives)ignore-words-list = aks,invokable
aks - Azure Kubernetes Service (acronym)invokable - InfluxData product branding term (scriptable tasks/queries)To add more:
.codespellrcignore-words-list (comma-separated)Vale automatically runs on files you commit via Lefthook.
Manual check:
# Check all content
docker compose run -T vale content/**/*.md
# Check specific file
docker compose run -T vale content/influxdb/cloud/reference/cli.md
# Check entire content directory
codespell content/ --builtin clear
# Check specific directory
codespell content/influxdb3/core/
# Interactive mode (prompts for fixes)
codespell content/ --builtin clear -i 3
# Auto-fix (USE WITH CAUTION)
codespell content/ --builtin clear -w
The spell-checking rules are designed to:
✅ Catch real spelling errors (true positives) ✅ Ignore code patterns, identifiers, and paths (false negative prevention) ✅ Respect product branding terms (invokable, Flux, InfluxQL) ✅ Work seamlessly in existing workflows
Create a test file with various patterns:
# Test camelCase handling
echo "variable myVariable is defined" | codespell
# Test version numbers
echo "InfluxDB version 2.3.1 is released" | codespell
# Test real typos (should be caught)
echo "recieve the data" | codespell
Problem: Vale flags a word that should be valid
Solutions:
InfluxDataDocs/Terms/ignore.txt if it's a technical term.ci/vale/styles/InfluxDataDocs/Spelling.yml if it's a patternProblem: Codespell flags a legitimate term
Solutions:
ignore-words-list in .codespellrc-i 3 (interactive mode) to review before acceptingProblem: A real typo isn't caught
Solutions:
When adding content:
.ci/vale/styles/InfluxDataDocs/ - Vale rule configuration.codespellrc - Codespell configuration.codespellignore - Codespell ignore word listDOCS-CONTRIBUTING.md - General contribution guidelinesDOCS-TESTING.md - Testing and validation guide