docs/guide/getting-started.md
Token-Oriented Object Notation is a compact, human-readable encoding of the JSON data model that minimizes tokens and makes structure easy for models to follow. It's intended for LLM input as a drop-in, lossless representation of your existing JSON.
TOON combines YAML's indentation-based structure for nested objects with a CSV-style tabular layout for uniform arrays. TOON's sweet spot is uniform arrays of objects (multiple fields per row, same structure across items), achieving CSV-like compactness while adding explicit structure that helps LLMs parse and validate data reliably.
Think of it as a translation layer: use JSON programmatically, and encode it as TOON for LLM input.
Standard JSON is verbose and token-expensive. For uniform arrays of objects, JSON repeats every field name for every record:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
YAML already reduces some redundancy with indentation instead of braces:
users:
- id: 1
name: Alice
role: admin
- id: 2
name: Bob
role: user
TOON goes further by declaring fields once and streaming data as rows:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
The [2] declares the array length, enabling LLMs to answer dataset size questions and detect truncation. The {id,name,role} declares the field names. Each row is then a compact, comma-separated list of values. This is the core pattern: declare structure once, stream data compactly. The format approaches CSV's efficiency while adding explicit structure.
For a more realistic example, here's how TOON handles a dataset with both nested objects and tabular arrays:
::: code-group
{
"context": {
"task": "Our favorite hikes together",
"location": "Boulder",
"season": "spring_2025"
},
"friends": ["ana", "luis", "sam"],
"hikes": [
{
"id": 1,
"name": "Blue Lake Trail",
"distanceKm": 7.5,
"elevationGain": 320,
"companion": "ana",
"wasSunny": true
},
{
"id": 2,
"name": "Ridge Overlook",
"distanceKm": 9.2,
"elevationGain": 540,
"companion": "luis",
"wasSunny": false
},
{
"id": 3,
"name": "Wildflower Loop",
"distanceKm": 5.1,
"elevationGain": 180,
"companion": "sam",
"wasSunny": true
}
]
}
context:
task: Our favorite hikes together
location: Boulder
season: spring_2025
friends[3]: ana,luis,sam
hikes[3]{id,name,distanceKm,elevationGain,companion,wasSunny}:
1,Blue Lake Trail,7.5,320,ana,true
2,Ridge Overlook,9.2,540,luis,false
3,Wildflower Loop,5.1,180,sam,true
:::
Notice how TOON combines YAML's indentation for the context object with inline format for the primitive friends array and tabular format for the structured hikes array. Each format is chosen automatically based on the data structure.
TOON is optimized for specific use cases. It aims to:
TOON excels with uniform arrays of objects – data with the same structure across items. For LLM prompts, the format produces deterministic, minimally quoted text with built-in validation. Explicit array lengths ([N]) and field headers ({fields}) help detect truncation and malformed data, while the tabular structure declares fields once rather than repeating them in every row.
::: tip The TOON format is stable, but also an idea in progress. Nothing's set in stone – help shape where it goes by contributing to the spec or sharing feedback. :::
TOON is not always the best choice. Consider alternatives when:
[!NOTE] For data-driven comparisons across different structures, see benchmarks. When optimizing for latency, measure TTFT, tokens/sec, and total time for both TOON and JSON-compact and use whichever performs better in your specific environment.
Install the library via your preferred package manager:
::: code-group
npm install @toon-format/toon
pnpm add @toon-format/toon
yarn add @toon-format/toon
:::
The CLI can be used without installation via npx, or installed globally:
::: code-group
npx @toon-format/cli input.json -o output.toon
npm install -g @toon-format/cli
pnpm add -g @toon-format/cli
yarn global add @toon-format/cli
:::
For full CLI documentation, see the CLI reference.
TOON files conventionally use the .toon extension. For HTTP transmission, the provisional media type is text/toon, always with UTF-8 encoding. While you may specify charset=utf-8 explicitly, it's optional – UTF-8 is the default assumption. This follows the registration process outlined in spec §18.2.
The examples below use the TypeScript library for demonstration, but the same operations work in any language with a TOON implementation.
Let's encode a simple dataset with the TypeScript library:
import { encode } from '@toon-format/toon'
const data = {
users: [
{ id: 1, name: 'Alice', role: 'admin' },
{ id: 2, name: 'Bob', role: 'user' }
]
}
console.log(encode(data))
Output:
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
Decoding is just as simple:
import { decode } from '@toon-format/toon'
const toon = `
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
`
const data = decode(toon)
console.log(JSON.stringify(data, null, 2))
Output:
{
"users": [
{ "id": 1, "name": "Alice", "role": "admin" },
{ "id": 2, "name": "Bob", "role": "user" }
]
}
Round-tripping is lossless: decode(encode(x)) always equals x (after normalization of non-JSON types like Date, NaN, etc.).
Now that you've seen your first TOON document, read the Format Overview for complete syntax details (objects, arrays, quoting rules, key folding), then explore Using TOON with LLMs to see how to use it effectively in prompts. For implementation details, check the API Reference (TypeScript) or the Specification (language-agnostic normative rules).