docs/guide/format-overview.md
TOON syntax reference with concrete examples. See Getting Started for an introduction.
TOON models data the same way as JSON:
nullA TOON document can represent different root forms:
[N]: or [N]{fields}: at depth 0Most examples in these docs use root objects, but the format supports all three forms equally (spec §5).
Objects with primitive values use key: value syntax, with one field per line:
id: 123
name: Ada
active: true
Indentation replaces braces. One space follows the colon.
Nested objects add one indentation level (default: 2 spaces):
user:
id: 123
name: Ada
When a key ends with : and has no value on the same line, it opens a nested object. All lines at the next indentation level belong to that object.
An empty object at the root yields an empty document (no lines). A nested empty object is key: alone, with no children.
TOON detects array structure and chooses the most efficient representation. Arrays always declare their length in brackets: [N].
Arrays of primitives (strings, numbers, booleans, null) are rendered inline:
tags[3]: admin,ops,dev
The delimiter (comma by default) separates values. Strings containing the active delimiter must be quoted.
When all objects in an array share the same set of primitive-valued keys, TOON uses tabular format:
::: code-group
items[2]{sku,qty,price}:
A1,2,9.99
B2,1,14.5
users[2]{id,name,role}:
1,Alice Admin,admin
2,"Bob Smith",user
:::
The header items[2]{sku,qty,price}: declares:
[2] means 2 rows{sku,qty,price} defines the columnsEach row contains values in the same order as the field list. Values are encoded as primitives (strings, numbers, booleans, null) and separated by the delimiter.
[!NOTE] Tabular format requires identical field sets across all objects (same keys, order per object may vary), primitive values only (no nested arrays/objects), and at least one key per object – arrays that contain an empty
{}element fall back to the expanded list form below.
Arrays that don't meet the tabular requirements use list format with hyphen markers:
items[3]:
- 1
- a: 1
- text
Each element starts with - at one indentation level deeper than the parent array header.
When an array element is an object, it appears as a list item:
items[2]:
- id: 1
name: First
- id: 2
name: Second
extra: true
When a tabular array is the first field of a list-item object, the tabular header appears on the hyphen line, with rows indented two levels deeper and other fields indented one level deeper:
items[1]:
- users[2]{id,name}:
1,Ada
2,Bob
status: active
When the object has only a single tabular field, the same pattern applies:
items[1]:
- users[2]{id,name}:
1,Ada
2,Bob
This is the canonical encoding for list-item objects whose first field is a tabular array.
When you have arrays containing primitive inner arrays:
pairs[2]:
- [2]: 1,2
- [2]: 3,4
Each inner array gets its own header on the list-item line.
When the inner arrays are themselves arrays of objects or non-uniform arrays, the same - [N]: header appears on the hyphen line and the nested items follow one indent deeper:
items[3]:
- summary
- id: 1
name: Ada
- [2]:
- id: 2
- status: draft
Empty arrays render as key: [] for fields and [] at the root:
items: []
The legacy items[0]: form is still decoded for backward compatibility.
Array headers follow this pattern:
key[N<delimiter?>]<{fields}>:
Where:
,)\t (tab character) → tab delimiter| → pipe delimiter{field1,field2,field3}[!NOTE] The array length
[N]helps LLMs validate structure. If you ask a model to generate TOON output, explicit lengths let you detect truncation or malformed data.
TOON supports three delimiters: comma (default), tab, and pipe. The delimiter is scoped to the array header that declares it.
::: code-group
items[2]{sku,name,qty,price}:
A1,Widget,2,9.99
B2,Gadget,1,14.5
items[2 ]{sku name qty price}:
A1 Widget 2 9.99
B2 Gadget 1 14.5
items[2|]{sku|name|qty|price}:
A1|Widget|2|9.99
B2|Gadget|1|14.5
:::
Tab and pipe delimiters are explicitly encoded in the header brackets and field braces. Inside an array scope, only the active delimiter triggers quoting – the others are literal data. Object field values (key: value) follow the document delimiter (§11.1) regardless of any surrounding array's active delimiter.
[!TIP] Tab delimiters often tokenize more efficiently than commas, especially for data with few quoted strings. Use
encode(data, { delimiter: '\t' })for additional token savings.
Key folding is an optional encoder feature (since spec v1.5) that collapses chains of single-key objects into dotted paths, reducing tokens for deeply nested data.
Standard nesting:
data:
metadata:
items[2]: a,b
With key folding (keyFolding: 'safe'):
data.metadata.items[2]: a,b
The three nested objects collapse into a single dotted key data.metadata.items.
A chain of objects is foldable when:
::: details Advanced Folding Rules Segment Requirements (safe mode):
^[A-Za-z_][A-Za-z0-9_]*$ (no dots, hyphens, or other special characters)Depth Limit:
flattenDepth option (default: Infinity) controls how many segments to foldflattenDepth: 2 folds only two-segment chains: {a: {b: val}} → a.b: valRound-Trip with Path Expansion:
To reconstruct the original structure when decoding, use expandPaths: 'safe'. This splits dotted keys back into nested objects using the same safety rules (spec §13.4).
:::
When decoding TOON that used key folding, enable path expansion to restore the nested structure:
import { decode, encode } from '@toon-format/toon'
const original = { data: { metadata: { items: ['a', 'b'] } } }
// Encode with folding
const toon = encode(original, { keyFolding: 'safe' })
// → "data.metadata.items[2]: a,b"
// Decode with expansion
const restored = decode(toon, { expandPaths: 'safe' })
// → { data: { metadata: { items: ['a', 'b'] } } }
Path expansion is off by default, so dotted keys are treated as literal keys unless explicitly enabled.
TOON quotes strings only when necessary to maximize token efficiency. A string must be quoted if:
"")true, false, or null (case-sensitive)"42", "-3.14", "1e-6", "05"):), quote ("), backslash (\), brackets, braces, or any control character in U+0000–U+001F"-" or starts with "-" followed by any characterOtherwise, strings can be unquoted. Unicode, emoji, and strings with internal (non-leading/trailing) spaces are safe unquoted:
message: Hello 世界 👋
note: This has inner spaces
In quoted strings and keys, six escape sequences are valid:
| Character | Escape |
|---|---|
Backslash (\) | \\ |
Double quote (") | \" |
| Newline (U+000A) | \n |
| Carriage return (U+000D) | \r |
| Tab (U+0009) | \t |
| Any other U+0000–U+001F control character | \uXXXX |
Other escapes (e.g., \x, \0, \b) are always rejected, as are lone-surrogate \uXXXX values (U+D800–U+DFFF).
Numbers are emitted in canonical decimal form (no exponent notation, no trailing zeros). Non-JSON types are normalized before encoding:
| Input | Output |
|---|---|
| Finite number | Canonical decimal (e.g., 1e6 → 1000000, 1.5000 → 1.5, -0 → 0) |
NaN, Infinity, -Infinity | null |
BigInt (within safe range) | Number |
BigInt (out of range) | Quoted decimal string (e.g., "9007199254740993") |
Date | ISO string in quotes (e.g., "2025-01-01T00:00:00.000Z") |
Set | Array of normalized values |
Map | Object with String(key) keys |
undefined, function, symbol | null |
Decoders accept both decimal and exponent forms on input (e.g., 42, -3.14, 1e-6), and treat tokens with forbidden leading zeros (e.g., "05") as strings, not numbers.
Objects with a toJSON() method are serialized by calling the method and normalizing its result before encoding, similar to JSON.stringify:
const obj = {
data: 'example',
toJSON() {
return { info: this.data }
}
}
encode(obj)
// info: example
The toJSON() method:
toJSON in their prototype chainFor complete rules on quoting, escaping, type conversions, and strict-mode decoding, see spec §2–4 (data model), §7 (strings and keys), and §14 (strict mode).