docs/guide/format-overview.md
TOON syntax reference with concrete examples. See Getting Started for introduction.
TOON models data the same way as JSON:
nullA TOON document can represent different root forms:
[N]: or [N]{fields}: at depth 0Most examples in these docs use root objects, but the format supports all three forms equally (spec §5).
Objects with primitive values use key: value syntax, with one field per line:
id: 123
name: Ada
active: true
Indentation replaces braces. One space follows the colon.
Nested objects add one indentation level (default: 2 spaces):
user:
id: 123
name: Ada
When a key ends with : and has no value on the same line, it opens a nested object. All lines at the next indentation level belong to that object.
An empty object at the root yields an empty document (no lines). A nested empty object is key: alone, with no children.
TOON detects array structure and chooses the most efficient representation. Arrays always declare their length in brackets: [N].
Arrays of primitives (strings, numbers, booleans, null) are rendered inline:
tags[3]: admin,ops,dev
The delimiter (comma by default) separates values. Strings containing the active delimiter must be quoted.
When all objects in an array share the same set of primitive-valued keys, TOON uses tabular format:
::: code-group
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
users[2]{id,name,role}:
1,Alice Admin,admin
2,"Bob Smith",user
:::
The header items[2]{sku,qty,price}: declares:
[2] means 2 rows{sku,qty,price} defines the columnsEach row contains values in the same order as the field list. Values are encoded as primitives (strings, numbers, booleans, null) and separated by the delimiter.
[!NOTE] Tabular format requires identical field sets across all objects (same keys, order per object may vary) and primitive values only (no nested arrays/objects).
Arrays that don't meet the tabular requirements use list format with hyphen markers:
items[3]:
- 1
- a: 1
- text
Each element starts with - at one indentation level deeper than the parent array header.
When an array element is an object, it appears as a list item:
items[2]:
- id: 1
name: First
- id: 2
name: Second
extra: true
When a tabular array is the first field of a list-item object, the tabular header appears on the hyphen line, with rows indented two levels deeper and other fields indented one level deeper:
items[1]:
- users[2]{id,name}:
1,Ada
2,Bob
status: active
When the object has only a single tabular field, the same pattern applies:
items[1]:
- users[2]{id,name}:
1,Ada
2,Bob
This is the canonical encoding for list-item objects whose first field is a tabular array.
When you have arrays containing primitive inner arrays:
pairs[2]:
- [2]: 1,2
- [2]: 3,4
Each inner array gets its own header on the list-item line.
Empty arrays have special representations:
items[0]:
The header declares length zero, with no elements following.
Array headers follow this pattern:
key[N<delimiter?>]<{fields}>:
Where:
,)\t (tab character) → tab delimiter| → pipe delimiter{field1,field2,field3}[!TIP] The array length
[N]helps LLMs validate structure. If you ask a model to generate TOON output, explicit lengths let you detect truncation or malformed data.
TOON supports three delimiters: comma (default), tab, and pipe. The delimiter is scoped to the array header that declares it.
::: code-group
items[2]{sku,name,qty,price}:
A1,Widget,2,9.99
B2,Gadget,1,14.5
items[2 ]{sku name qty price}:
A1 Widget 2 9.99
B2 Gadget 1 14.5
items[2|]{sku|name|qty|price}:
A1|Widget|2|9.99
B2|Gadget|1|14.5
:::
Tab and pipe delimiters are explicitly encoded in the header brackets and field braces. Commas don't require quoting when tab or pipe is active, and vice versa.
[!TIP] Tab delimiters often tokenize more efficiently than commas, especially for data with few quoted strings. Use
encode(data, { delimiter: '\t' })for additional token savings.
Key folding is an optional encoder feature (since spec v1.5) that collapses chains of single-key objects into dotted paths, reducing tokens for deeply nested data.
Standard nesting:
data:
metadata:
items[2]: a,b
With key folding (keyFolding: 'safe'):
data.metadata.items[2]: a,b
The three nested objects collapse into a single dotted key data.metadata.items.
A chain of objects is foldable when:
::: details Advanced Folding Rules Segment Requirements (safe mode):
^[A-Za-z_][A-Za-z0-9_]*$ (no dots, hyphens, or other special characters)Depth Limit:
flattenDepth option (default: Infinity) controls how many segments to foldflattenDepth: 2 folds only two-segment chains: {a: {b: val}} → a.b: valRound-Trip with Path Expansion:
To reconstruct the original structure when decoding, use expandPaths: 'safe'. This splits dotted keys back into nested objects using the same safety rules (spec §13.4).
:::
When decoding TOON that used key folding, enable path expansion to restore the nested structure:
import { decode, encode } from '@toon-format/toon'
const original = { data: { metadata: { items: ['a', 'b'] } } }
// Encode with folding
const toon = encode(original, { keyFolding: 'safe' })
// → "data.metadata.items[2]: a,b"
// Decode with expansion
const restored = decode(toon, { expandPaths: 'safe' })
// → { data: { metadata: { items: ['a', 'b'] } } }
Path expansion is off by default, so dotted keys are treated as literal keys unless explicitly enabled.
TOON quotes strings only when necessary to maximize token efficiency. A string must be quoted if:
"")true, false, or null (case-sensitive)"42", "-3.14", "1e-6", or "05" with leading zeros):), quote ("), backslash (\), brackets, braces, or control characters (newline, tab, carriage return)"-" or starts with "-" followed by any characterOtherwise, strings can be unquoted. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted:
message: Hello 世界 👋
note: This has inner spaces
In quoted strings and keys, only five escape sequences are valid:
| Character | Escape |
|---|---|
Backslash (\) | \\ |
Double quote (") | \" |
| Newline (U+000A) | \n |
| Carriage return (U+000D) | \r |
| Tab (U+0009) | \t |
All other escape sequences (e.g., \x, \u) are invalid and will cause an error in strict mode.
Numbers are emitted in canonical decimal form (no exponent notation, no trailing zeros). Non-JSON types are normalized before encoding:
| Input | Output |
|---|---|
| Finite number | Canonical decimal (e.g., 1e6 → 1000000, 1.5000 → 1.5, -0 → 0) |
NaN, Infinity, -Infinity | null |
BigInt (within safe range) | Number |
BigInt (out of range) | Quoted decimal string (e.g., "9007199254740993") |
Date | ISO string in quotes (e.g., "2025-01-01T00:00:00.000Z") |
undefined, function, symbol | null |
Decoders accept both decimal and exponent forms on input (e.g., 42, -3.14, 1e-6), and treat tokens with forbidden leading zeros (e.g., "05") as strings, not numbers.
Objects with a toJSON() method are serialized by calling the method and normalizing its result before encoding, similar to JSON.stringify:
const obj = {
data: 'example',
toJSON() {
return { info: this.data }
}
}
encode(obj)
// info: example
The toJSON() method:
toJSON in their prototype chainFor complete rules on quoting, escaping, type conversions, and strict-mode decoding, see spec §2–4 (data model), §7 (strings and keys), and §14 (strict mode).