docs/proposals/compact-ast-format.md
Draft
The current message-parser AST is verbose — every node carries its full text content as self-contained strings. For a message like **hello** world, the AST stores "hello" inside the Bold node and " world" in a separate Plain node, even though both substrings already exist in the original message. This redundancy inflates payload size, especially for messages with deeply nested formatting (bold inside italic inside strikethrough, etc.).
In high-traffic environments, parsed messages are stored and transmitted frequently. Reducing the AST footprint has a direct impact on storage costs, cache efficiency, and wire transfer size.
Introduce a compact AST format that replaces self-contained string values with span references ([start, end]) into the original message text.
Instead of:
{
"type": "PARAGRAPH",
"value": [
{ "type": "BOLD", "value": [{ "type": "PLAIN_TEXT", "value": "hello" }] },
{ "type": "PLAIN_TEXT", "value": " world" }
]
}
The compact format stores:
{ "t": "p", "c": [{ "t": "b", "c": [[2, 7]] }, [8, 14]] }
Plain text nodes become simple [start, end] tuples. Structural nodes use short type keys (b, i, s, p, h, etc.) and reference children via the same span mechanism.
| Function | Description |
|---|---|
compactify(ast, msg) | Converts a verbose AST + original message into a compact AST |
expand(compactAst, msg) | Reconstructs the full verbose AST from a compact AST + original message |
validateRoundtrip(ast, msg) | Verifies expand(compactify(ast, msg), msg) equals the original AST |
| Verbose Type | Compact Key | Notes |
|---|---|---|
PLAIN_TEXT | [start, end] | Span tuple, no wrapper object |
BOLD | b | |
ITALIC | i | |
STRIKE | s | |
SPOILER | || | |
INLINE_CODE | ` | |
MENTION_USER | @ | |
MENTION_CHANNEL | # | |
INLINE_KATEX | $ | |
LINK | a | |
IMAGE | img | |
EMOJI | : | |
TIMESTAMP | ts | |
COLOR | c | Stores RGBA as [r, g, b, a] |
PARAGRAPH | p | |
HEADING | h | Includes level l: 1..4 |
CODE | ``` | |
BLOCKQUOTE | > | |
QUOTE | q | |
SPOILER_BLOCK | ||| | |
ORDERED_LIST | ol | |
UNORDERED_LIST | ul | |
TASKS | tl | |
KATEX | $$ | |
LINE_BREAK | br | |
BIG_EMOJI | E |
compactify/expand could corrupt message renderingA working proof-of-concept implementation exists with full bidirectional conversion and roundtrip validation tests. It covers all current AST node types including BigEmoji, lists, tasks, code blocks, KaTeX, and color nodes.