docs/src/ai-support.md
When you use the Miller agent skill or the Miller MCP server,
here are the mlr subcommands your agent runs on your behalf to acquire support. (See also Miller
and AI for an introduction.)
The new Miller subcommands as of version 6.20 allow agents to discover information about how to use Miller, constrain attempted solutions to match the data, validate Miller commands before running them, run them, and robustly recover from errors.
If you like, you can run these subcommands yourself, although you don't need to. These AI-support subcommands are documented here for transparency.
This is the machine-readable catalog of verbs, DSL functions, flags, and keywords, plus intent-to-capability routing.
These are implemented by mlr help --as-json and mlr which.
mlr help --as-json emits Miller's entire help catalog as one JSON document.
The --index form is the cheap first call -- every capability with a
one-line summary (here trimmed, and then counted, using Miller itself):
From the index, an agent drills into full entries one at a time: mlr help verb sort --as-json,
mlr help function splitax --as-json, mlr help flag --ifs --as-json, mlr help keyword ENV --as-json -- each accepting one or more names. A verb entry carries a structured option list --
flag, argument placeholder, type -- alongside the familiar usage text:
Note that usage_text -- what mlr decimate --help prints -- is rendered from the same
structured options, so the human help and the machine help cannot drift apart. Function entries
carry name, class, arity, help, and examples; the examples across the whole catalog are exercised by
Miller's test suite, so they never rot.
Three properties make the catalog cheap to use:
mlr_version and
catalog_schema_version. Miller is a static binary, so the catalog changes
only when the binary does: fetch once, cache forever, re-fetch on a version
bump. No TTLs.--as-json, or set-once via a
truthy MLR_HELP_JSON environment variable.For routing an intent to a capability -- the reverse of browsing -- mlr which returns ranked candidates:
Its exit code signals confidence -- 0 when a query word matched a capability's name, 2 when it didn't -- so a harness can branch on status without parsing anything.
This shows field names, types, cardinality, and value domains for your actual input data.
It's implemented by mlr describe.
Agents don't just hallucinate flags; they hallucinate values. Miller attacks that from both sides.
Where an option's domain is fixed by the binary, the catalog says so:
type is enum and values is the complete list. Here's one option of the
summary verb, extracted from the catalog --
using Miller to query Miller:
Where the domain depends on your data -- which fields exist, what values
filter could compare against, what to pass to -g -- the
describe verb profiles the input in one pass:
per field, the types seen, counts, cardinality, min/max, and (for
low-cardinality fields) every distinct value:
The catalog is the tool's shape; describe is the data's shape. An
agent that consults both has nothing left to guess.
This lets the agent parse and type-check a DSL expression before reading any input files.
It's implemented by mlr put --explain and mlr filter --explain.
mlr put --explain (likewise mlr filter --explain) parses and type-checks
an expression, then exits -- without opening any input at all:
Agents are instructed to run Miller commands using mlr with the --errors-json flag so that a
failure comes back as a structured document instead of prose.
With --errors-json (or MLR_ERRORS_JSON=true environment variable), errors arrive as a structured
document. The kind field gives an agent something to branch on; hint is a runnable next step,
not a sentence; and did_you_mean is computed against the same catalog the agent discovered from,
closing the self-correction loop:
And since Miller's DSL includes system and exec, there's a sandbox:
--no-shell (or a truthy MLR_NO_SHELL environment variable) disables all external-command
execution -- the DSL system and exec functions, piped redirects, and --prepipe fail cleanly:
A typical agent profile sets all three environment variables once:
<pre class="pre-non-highlight-non-pair"> export MLR_HELP_JSON=1 # help/catalog output as JSON export MLR_ERRORS_JSON=1 # errors as structured JSON export MLR_NO_SHELL=1 # no external-command execution </pre>Put together, the sections above are a loop -- discover, constrain, validate, run -- where each step feeds the next, and failures route back with structure instead of prose.