integration/tmgrammar/README.md
TextMate grammar for the Vespa schema language (.sd files), plus tooling to generate, test, and compare it against the Java LSP's semantic tokens.
The grammar file is grammar/vespa-schema.tmLanguage.json. You can use it directly in any tool that supports TextMate grammars:
import { createHighlighter } from 'shiki';
const vespaGrammar = JSON.parse(fs.readFileSync('vespa-schema.tmLanguage.json', 'utf-8'));
const highlighter = await createHighlighter({
langs: [vespaGrammar],
themes: ['github-dark'],
});
const html = highlighter.codeToHtml(sdCode, { lang: 'vespa-schema', theme: 'github-dark' });
Works with Shiki-powered frameworks: VitePress, Astro, Nuxt Content, etc.
Ships with the Vespa extension for static highlighting when the Java LSP is not running.
Native .sd highlighting in code blocks, diffs, and file views. Requires acceptance into github-linguist (contribution in progress).
Sublime Text, BBEdit, and any editor that consumes .tmLanguage.json grammars.
The grammar is auto-generated from the same source-of-truth as the Java LSP (CongoCC grammars + SchemaSemanticTokenConfig.java + BuiltInFunctions.java), ensuring scope assignments match the LSP's semantic token colors in VS Code's default Dark+ theme.
The generator (tools/generate_tmgrammar.py) reads three CongoCC grammar files and two Java config files directly from the vespa repo, then:
.ccc grammars (schema, indexing, ranking)SchemaSemanticTokenConfig.java -- which tokens are keywords, types, operators, functions, etc.Keyword -> keyword.control.vespa)BuiltInFunctions.java (bm25, fieldMatch, onnx, etc.)rank-profile vs rank)Source files are read directly from the vespa repo tree:
integration/schema-language-server/language-server/src/main/ccc/ -- CongoCC grammarsintegration/schema-language-server/language-server/src/main/java/ai/vespa/schemals/ -- Java config filesintegration/schema-language-server/language-server/src/test/sdfiles/ -- test .sd filesAll commands below assume you are in the integration/tmgrammar/ directory.
uv run tools/generate_tmgrammar.py
uv run tools/test_tmgrammar.py
The comparison needs tools/java_tokens.json produced by SemanticTokenDumper.
This is a utility (not a test) that lives in test sources because it needs test scaffolding.
From the vespa repo root:
cd integration/schema-language-server/language-server
mvn test-compile exec:java \
-Dexec.mainClass=ai.vespa.schemals.SemanticTokenDumper \
-Dexec.classpathScope=test
This writes java_tokens.json directly to integration/tmgrammar/tools/.
Then, back in this directory:
uv run tools/compare_tokens.py # summary
uv run tools/compare_tokens.py --fixable-only # only actionable items
uv run tools/compare_tokens.py --file embed.sd # per-file detail
Or use the convenience wrapper (tokenizes .sd files with vscode-textmate, then compares):
./tools/run_comparison.sh # tokenize + compare
./tools/run_comparison.sh --regenerate # regenerate grammar first
A browser-based viewer for inspecting .sd file colorization across VS Code themes (Shiki v3 + Vite):
cd playground && npm install && npm run dev
# Open http://localhost:5173
Audit uncolored tokens (finds tokens that fall through to the theme's default foreground):
cd playground && node audit-colors.mjs # all files, github-dark
node audit-colors.mjs --theme github-light # specific theme
node audit-colors.mjs --file spotcheck.sd # specific file
integration/tmgrammar/
grammar/
vespa-schema.tmLanguage.json # THE OUTPUT -- generated TextMate grammar
tools/ # Build and validation pipeline
generate_tmgrammar.py # Generator: reads .ccc grammars + Java config, writes grammar/
test_tmgrammar.py # Validates keyword completeness, structure, scopes, .sd parsing
compare_tokens.py # Structural diff: Java LSP token types vs TM scope assignments
tm_tokenize.mjs # Tokenizes .sd files with the real vscode-textmate engine
run_comparison.sh # Convenience wrapper: tokenize + compare in one step
package.json # Node.js deps (vscode-textmate, vscode-oniguruma)
playground/ # Visual inspection during development
main.js + index.html + css # Shiki-powered browser viewer, 10 VS Code themes
vite.config.js # Dev server serving grammar + .sd files
audit-colors.mjs # CLI: find tokens that are visually uncolored in a given theme
package.json # Node.js deps (shiki, vite)
integration/schema-language-server/language-server/src/test/java/ai/vespa/schemals/
SemanticTokenDumper.java # Java utility: dumps LSP semantic tokens to JSON
tools/ is the build pipeline -- generate, validate, and structurally compare the grammar against the Java LSP. playground/ is for visual inspection -- see how .sd files actually render across themes and find tokens that aren't getting colored.
To update the grammar after Vespa parser changes:
uv run tools/generate_tmgrammar.py to regenerateuv run tools/test_tmgrammar.py to validateApache 2.0 -- see the top-level LICENSE file.