docs/autoparser.md
The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls.
The unified auto-parser uses a pure differential, compositional approach (inspired by the git diff algorithm) to analyze chat templates:
Core Philosophy:
JSON_NATIVE from tag-based formats)Analysis + Parser Building in Two Steps:
autoparser::autoparser tmpl_analysis(tmpl) — runs all differential comparisons and populates the analysis structsautoparser::peg_generator::generate_parser(tmpl, generation_params, tmpl_analysis) — uses the analysis to build a PEG parser and optional GBNF grammarAll structs are defined in common/chat-auto-parser.h.
autoparser (main analyzer and generator)common/chat-auto-parser.h:367-388 — top-level analysis result aggregating jinja_caps, reasoning, content, and tools sub-analyses, plus preserved_tokens (union of all non-empty markers).
analyze_reasoningcommon/chat-auto-parser.h:254-274 — reasoning analysis result: mode enum, start marker (e.g. <think>), and end marker (e.g. </think>).
analyze_contentcommon/chat-auto-parser.h:280-295 — content analysis result: mode enum, start/end markers, and requires_nonnull_content flag.
analyze_tools and its sub-structstool_format_analysis: mode enum, section_start/end, per_call_start/end, JSON field names (function_field, name_field, args_field, id_field, gen_id_field), and format flags (fun_name_is_key, tools_array_wrapped)tool_function_analysis: name_prefix, name_suffix, close markers around function namestool_arguments_analysis: start/end container markers, name_prefix/suffix, value_prefix/suffix, separatortool_id_analysis: pos enum, prefix/suffix markers around call ID valuesanalyze_tools: aggregates the four sub-structs abovereasoning_mode: How the template handles reasoning/thinking blocks.
| Value | Description |
|---|---|
NONE | No reasoning markers detected |
TAG_BASED | Tag-based: <think>...</think> (start can be empty for delimiter-style formats) |
TOOLS_ONLY | Reasoning only appears in tool call responses, not plain content |
Generation Prompt & Reasoning Prefill: Computed in common_chat_templates_apply_jinja before invoking either the specialized handlers or the auto-parser, by rendering the template twice — once with add_generation_prompt=false and once with add_generation_prompt=true — and storing the diff suffix as generation_params::generation_prompt. This string is propagated into common_chat_params::generation_prompt and common_chat_parser_params::generation_prompt.
The generation prompt is prepended to model output before PEG parsing via wrap_for_generation_prompt(). The portion before the reasoning start marker (if any) is prepended as a literal to ensure any boilerplate added by the template is consumed. The full string is also fed to the grammar sampler via llama_sampler_accept (stored in common_params_sampling::grammar_prefill), advancing the grammar past tokens already in the prompt. It is used to determine the reasoning budget sampler's initial state — COUNTING if the prefill tokens begin with the reasoning start sequence (but don't also contain the end sequence), IDLE otherwise.
grammar_prefill (common_params_sampling): The generation prompt string tokenized and accepted by the grammar sampler at init time. Only applied when grammar_external is false (i.e., the grammar was not set explicitly by the user).
Three outcomes for reasoning-prefill handling (in generate_parser()):
<think></think>\n): the parser sees reasoning as opened and immediately closed; whitespace-only reasoning content is discarded.<think>\n): the parser sees reasoning as already open.<|begin_assistant|> followed by boilerplate): the marker is a template artifact; the start literal is cleared so reasoning uses delimiter-style (end-only). For templates that ignore add_generation_prompt (empty diff), the rendered data.prompt is used as fallback — but only for non-TOOLS_ONLY modes, since in TOOLS_ONLY the start tag is model-generated and may appear in prior conversation turns.content_mode: How the template wraps assistant content.
| Value | Description |
|---|---|
PLAIN | No content markers |
ALWAYS_WRAPPED | Content always wrapped: <response>...</response> |
WRAPPED_WITH_REASONING | Content wrapped only when reasoning is present |
tool_format: Classification of tool call structure.
| Value | Description |
|---|---|
NONE | No tool support detected |
JSON_NATIVE | Pure JSON: {"name": "X", "arguments": {...}} |
TAG_WITH_JSON | Tag-based with JSON args: <function=X>{...}</function> |
TAG_WITH_TAGGED | Tag-based with tagged args: <param=key>value</param> |
call_id_position: Where call IDs appear in tag-based formats.
| Value | Description |
|---|---|
NONE | No call ID support detected |
PRE_FUNC_NAME | Before function name |
BETWEEN_FUNC_AND_ARGS | Between function name and arguments |
POST_ARGS | After arguments |
Structure: The entire tool call (function name, arguments, values) is in JSON format. Optional enclosing tags around the section.
Detection: Function name appears inside a JSON structure (quotes preceded by { or :).
Examples:
Standard OpenAI-style:
<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"}}
</tool_call>
Mistral Nemo with array wrapper:
[TOOL_CALLS]
[{"name": "calculate", "arguments": {"expr": "2+2"}}]
Function name as JSON key (Apertus style):
{"get_weather": {"location": "Paris"}}
Structure: Function name is outside JSON, in tag attributes or XML-style tags. Arguments are a JSON object.
Detection: Function name not in JSON, but argument names appear in JSON context.
Examples:
Functionary v3.1:
<function=get_weather>{"location": "Paris", "unit": "celsius"}</function>
MiniMax:
<minimax:tool_call>
<tool_name>calculate</tool_name>
<arguments>{"expr": "2+2"}</arguments>
</minimax:tool_call>
Structure: Both function name and argument names are in XML-style tags. String values are unquoted; non-string values are JSON-formatted.
Detection: Neither function name nor argument names appear in a JSON context.
Examples:
Qwen/Hermes XML format:
<function=get_weather>
<param=location>Paris</param>
<param=unit>celsius</param>
</function>
Mixed types:
<function=calculate>
<param=expr>2+2</param>
<param=precision>2</param>
<param=options>{"round": true}</param>
</function>
String values (Paris, celsius, 2+2) are unquoted; options (object type) is JSON-formatted.
autoparser::autoparser(tmpl)
|
|-- Phase 1: analyze_reasoning(tmpl, jinja_caps.supports_tool_calls)
| |-- R1: compare_reasoning_presence() — with/without reasoning_content field
| |-- R2: compare_thinking_enabled() — enable_thinking=false vs true
| '-- R3: compare_reasoning_scope() — reasoning+content vs reasoning+tools
| (only if supports_tool_calls)
|
|-- Phase 2: analyze_content(tmpl, reasoning)
| '-- C1: compares content-only vs tools output and content-only vs reasoning output
|
|-- Phase 3: analyze_tools(tmpl, jinja_caps, reasoning)
| (skipped entirely if !jinja_caps.supports_tool_calls)
| |
| |-- T1: analyze_tool_calls() — no tools vs with tools; classifies format
| | |-- JSON path → analyze_tool_call_format_json_native()
| | '-- tag path → analyze_tool_call_format_non_json()
| |
| (if format != NONE and format != JSON_NATIVE:)
| |
| |-- T2: check_per_call_markers() — 1 call vs 2 calls; moves section→per-call if needed
| | (only if supports_parallel_tool_calls)
| |
| |-- T3: extract_function_markers() — func_alpha vs func_beta; extracts name prefix/suffix/close
| |
| |-- T4: analyze_arguments() — (TAG_WITH_TAGGED only)
| | |-- A1: extract_argument_name_markers() — arg_name_A vs arg_name_B
| | '-- A2: extract_argument_value_markers() — value "XXXX" vs "YYYY"
| |
| |-- T5: extract_argument_separator() — 1 arg vs 2 args; finds separator between args
| |
| |-- T6: extract_args_markers() — 0 args vs 1 arg; finds args container markers
| |
| '-- T7: extract_call_id_markers() — call_id "call00001" vs "call99999"
|
'-- collect_preserved_tokens() — union of all non-empty markers
|
'-- apply workarounds() — post-hoc patches for edge-case templates
|
v
autoparser (analysis result)
|
v
autoparser::peg_generator::generate_parser(tmpl, inputs, analysis)
|-- analysis.build_parser(inputs) — builds PEG parser arena
| |-- reasoning.build_parser(ctx) — reasoning parser (mode-dependent)
| |-- content.build_parser(ctx) — content parser (mode-dependent)
| '-- tools.build_parser(ctx) — tool parser (dispatches by tool_format)
| |-- build_tool_parser_json_native()
| |-- build_tool_parser_tag_json()
| '-- build_tool_parser_tag_tagged()
|
|-- Build GBNF grammar (if tools present and trigger_marker non-empty)
'-- Set grammar_triggers from section_start or per_call_start
|
v
common_chat_params (prompt, parser, grammar, triggers, preserved_tokens)
The auto-parser is invoked in common/chat.cpp:1280-1310 in common_chat_templates_apply_jinja. A few specialized templates are handled first (Ministral/Magistral Large 3, GPT-OSS with <|channel|>, Functionary v3.2 with >>>all), then the auto-parser handles everything else via autoparser::autoparser + peg_generator::generate_parser.
All analysis phases use the same factorized comparison function declared in common/chat-auto-parser-helpers.h:68:
compare_variants(tmpl, params_A, params_modifier)
This creates variant B by applying a modifier lambda to a copy of params_A, renders both through the template, and computes a diff_split (common/chat-auto-parser.h:28-37):
prefix — common prefix between A and Bsuffix — common suffix between A and Bleft — unique to variant Aright — unique to variant BThe diff is computed via calculate_diff_split(), which finds the longest-common-prefix and longest-common-suffix, then iteratively moves incomplete <...> or [...] markers from the prefix/suffix into left/right until stable (tag boundary fixing).
Text is segmentized into markers and non-marker fragments using segmentize_markers(), which splits on <...> and [...] boundaries.
R1 — compare_reasoning_presence(): Compares assistant message with vs without a reasoning_content field.
diff.right (output with reasoning) for the reasoning content needlediff.right → TAG_BASEDTAG_BASED (template forces markers; handled via prefill)TAG_BASED (delimiter-style, empty start)reasoning.start and reasoning.endR2 — compare_thinking_enabled(): Compares enable_thinking=false vs true with a generation prompt.
enable_thinking=true appends a non-empty marker → sets reasoning.start, mode = TAG_BASEDenable_thinking=false appends the marker instead): extracts both start (from the preceding segment) and end markers; mode = TAG_BASEDcommon_chat_templates_apply_jinja and prepended to model output before parsingR3 — compare_reasoning_scope(): Compares assistant message with reasoning+text-content vs reasoning+tool-calls.
jinja_caps.supports_tool_callsTOOLS_ONLY: reasoning content present in B (with tools) but not in A (with text content)C1: Two comparisons in the analyze_content constructor:
diff_toolsdiff_reasoningClassification logic:
PLAIN: diff_tools.left equals the response string (content is the entire diff, no wrapper)ALWAYS_WRAPPED: markers found surrounding the content text in pure_content → extracts start/endT1 — analyze_tool_calls(): Compares no-tools vs with-tools output.
diff.rightanalyze_tool_call_format() which first strips reasoning markers from the haystack, then:
in_json_haystack() for both function name and argument name needlesin_json_haystack() uses a PEG parser to check whether the needle appears in a JSON context (preceded by { or : with surrounding quotes)JSON_NATIVE → analyze_tool_call_format_json_native()TAG_WITH_JSONTAG_WITH_TAGGEDanalyze_tool_call_format_json_native(): parses the JSON object, matches field values to needles to populate name_field, args_field, id_field, gen_id_field; detects tools_array_wrapped; extracts section_start/section_endanalyze_tool_call_format_non_json(): uses PEG parsers on the haystack to find up to two opening markers (section + per-call) then up to two closing markersT2 — check_per_call_markers(): Compares 1 call vs 2 calls.
section_start → the section marker is actually per-call → moves section_start/end to per_call_start/end and clears the section markersT3 — extract_function_markers(): Compares function name FUN_FIRST vs FUN_SECOND (two different named functions).
diff.leftfunction.name_prefix from the common prefix up to the function marker, and function.name_suffix from after the name up to the next markername_suffix into diff.suffix (to the first marker for TAG_WITH_TAGGED; to the first { or [ for TAG_WITH_JSON)function.close from after the last argument value up to the per-call/section end markerT4 — analyze_arguments() (TAG_WITH_TAGGED only):
extract_argument_name_markers(): Compares arg_name_A vs arg_name_B (two different argument names).
arguments.name_prefix, arguments.name_suffixextract_argument_value_markers(): Compares argument value "XXXX" vs "YYYY" (same arg, different value).
arguments.value_prefix, arguments.value_suffixT5 — extract_argument_separator(): Compares 1 argument vs 2 arguments (same function).
until_common_prefix(diff.right, ARG_FIRST, ARG_SECOND) to find what separates the two argument blocksT6 — extract_args_markers(): Compares 0 arguments vs 1 argument.
until_common_prefix() and after_common_suffix() with the empty and single-arg JSON strings as anchors to find container markers (arguments.start, arguments.end)T7 — extract_call_id_markers(): Compares call IDs "call00001" vs "call99999".
diff.prefix or diff.suffix to classify position:
BETWEEN_FUNC_AND_ARGS or POST_ARGS (further distinguished by where { appears)PRE_FUNC_NAMEcall_id.prefix and call_id.suffix markers around the call ID valueper_call_end if it incorrectly incorporated the call ID suffixA workaround array in common/chat-diff-analyzer.cpp applies post-hoc patches after analysis. Each workaround is a lambda that inspects the template source and overrides analysis results. Current workarounds:
content.split('</think>') but not <SPECIAL_12>: sets reasoning.mode = TAG_BASED with <think>/</think> markers if no reasoning was detectedTAG_BASED reasoning with <think>/</think> and WRAPPED_WITH_REASONING content with <response>/</response><|CHATBOT_TOKEN|>: sets ALWAYS_WRAPPED content mode if no content start is already setset has_code_interpreter: forces PLAIN content, specific per_call_start/end, clears preserved tokens to only keep Functionary-specific markerstool▁calls▁begin markers: overrides tool section/per-call markers with the correct Unicode block charactersEach analyzer struct (analyze_reasoning, analyze_content, analyze_tools) implements build_parser(parser_build_context&). They share a parser_build_context that carries the PEG builder, inference inputs, the pre-built reasoning parser, and a pointer to the content analyzer.
analyze_reasoning::build_parser)| Mode | Parser |
|---|---|
| Not extracting reasoning | eps() |
TAG_BASED or TOOLS_ONLY (non-empty start) | optional(start + reasoning(until(end)) + end + space()) |
TAG_BASED or TOOLS_ONLY (empty start) | optional(reasoning(until(end)) + end + space()) — delimiter-style |
Note: The start marker may be empty either because the analyzer detected delimiter-style reasoning, or because generate_parser() cleared a template artifact start marker (see Generation Prompt & Reasoning Prefill above). Whitespace-only reasoning content (e.g. from a <think></think> prefill) is discarded by the mapper.
analyze_content::build_parser)| Condition | Parser |
|---|---|
json_schema present | reasoning + space() + content(schema(json(), "response-format", ...)) + end() |
| Tools present | Dispatches to analyze_tools::build_parser() |
ALWAYS_WRAPPED with reasoning | reasoning + start + content(until(end)) + end + end() |
ALWAYS_WRAPPED without reasoning | content(until(start)) + start + content(until(end)) + end + end() |
| Default (PLAIN) | reasoning + content(rest()) + end() |
analyze_tools::build_parser)Dispatches by format.mode:
build_tool_parser_json_native(): Calls p.standard_json_tools() which internally dispatches to:
build_json_tools_function_is_key() — function name is the JSON key: {"get_weather": {...}}build_json_tools_nested_keys() — nested: {"function": {"name": "X", "arguments": {...}}}build_json_tools_flat_keys() — flat: {"name": "X", "arguments": {...}}Handles content wrappers, array wrapping (tools_array_wrapped), parallel calls, and parameter_order.
build_tool_parser_tag_json(): For each tool function:
tool_open(name_prefix + tool_name(literal(name)) + name_suffix) +
call_id_section +
tool_args(schema(json(), tool_schema))
[+ function.close if non-empty]
Wrapped in per-call markers (with optional parallel call repetition) then optionally in section markers.
build_tool_parser_tag_tagged(): For each tool function, builds one parser per argument:
tool_arg_string_value(schema(until(value_suffix), ...))tool_arg_json_value(schema(json(), ...))optional()space() between consecutive parsersFor closing: uses function.close if present; otherwise uses peek(per_call_end) to avoid premature close during partial streaming; falls back to tool_close(space()) to trigger mapper callbacks.
All three tool parsers return:
reasoning + optional(content(until(trigger_marker))) + tool_calls + end()
Each returned parser is wrapped by wrap_for_generation_prompt(), which prepends a literal for any boilerplate prefix of the generation prompt (the portion before the reasoning start marker).
common_chat_peg_mapper maps PEG parse results (AST nodes) into common_chat_msg structures. Key design:
tool_name is known, argument text goes to args_buffer; once the name is set, the buffer is flushed to current_tool->argumentsargs_target(): Returns a reference to whichever destination is currently active (buffer or tool args), eliminating branchingclosing_quote_pending: Tracks whether a closing " needs to be appended when a string argument value is finalized (for schema-declared string types in tagged format)<think></think> prefill) is cleared so the message shows no reasoning{ braces are closed automatically| File | Purpose |
|---|---|
common/chat-auto-parser.h | All analysis structs, enums, autoparser, peg_generator, generation_params |
common/chat-auto-parser-generator.cpp | Parser generator: generate_parser() and build_parser() methods |
common/chat-diff-analyzer.cpp | Differential analysis implementation and workarounds |
common/chat-auto-parser-helpers.h/cpp | calculate_diff_split(), segmentize_markers(), compare_variants(), |
wrap_for_generation_prompt(), string helpers | |
common/chat-peg-parser.h/cpp | common_chat_peg_builder, common_chat_peg_mapper, and helpers |
common/chat.cpp | Entry point: common_chat_templates_apply_jinja() |
tools/parser/debug-template-parser.cpp | Debug tool for template analysis |
tools/parser/template-analysis.cpp | Template analysis tool |
Template Debugger: tools/parser/debug-template-parser.cpp
./bin/llama-debug-template-parser path/to/template.jinjaTemplate Analysis: tools/parser/template-analysis.cpp
./bin/llama-template-analysis path/to/template.jinjaDebug Logging: Enable with LLAMA_LOG_VERBOSITY=2
PEG Test Builder: Fluent API for creating test cases — see tests/test-chat.cpp:947-1043. Example usage:
auto tst = peg_tester("models/templates/Template.jinja");
tst.test("input text")
.reasoning_format(COMMON_REASONING_FORMAT_AUTO)
.tools({tool_json})
.parallel_tool_calls(true)
.enable_thinking(true)
.expect(expected_message)
.run();
The following templates have active tests in tests/test-chat.cpp:
| Template | Format | Notes |
|---|---|---|
| Ministral-3-14B-Reasoning | Reasoning | [THINK]...[/THINK] tags (specialized handler) |
| NVIDIA-Nemotron-3-Nano-30B | TAG_WITH_TAGGED | Reasoning + tools |
| CohereForAI Command-R7B | JSON_NATIVE | <|START_THINKING|>/<|START_RESPONSE|> markers |
| Google Gemma 2 2B | Content only | No tool support |
| Qwen-QwQ-32B | Reasoning | Forced-open thinking |
| NousResearch Hermes 2 Pro | JSON_NATIVE | <tool_call> wrapper |
| IBM Granite 3.3 | JSON_NATIVE | <think></think> + <response></response> |
| ByteDance Seed-OSS | TAG_WITH_TAGGED | Custom <seed:think> and <seed:tool_call> tags |
| Qwen3-Coder | TAG_WITH_TAGGED | XML-style tool format |
| DeepSeek V3.1 | JSON_NATIVE | Forced thinking mode |
| GLM-4.6 | TAG_WITH_TAGGED | <tool_call>name\n<arg_key>...<arg_value>... format |
| GLM-4.7-Flash | TAG_WITH_TAGGED | Updated GLM format |
| Kimi-K2-Thinking | JSON_NATIVE | Reasoning + JSON tools |
| Apertus-8B-Instruct | JSON_NATIVE | Function name as JSON key |
| MiniMax-M2 | TAG_WITH_JSON | XML invoke with JSON args |
| NVIDIA-Nemotron-Nano-v2 | JSON_NATIVE | <TOOLCALL> wrapper (nested) |
| CohereForAI Command-R Plus | JSON_NATIVE | Markdown code block format |
| Mistral-Nemo-Instruct-2407 | JSON_NATIVE | [TOOL_CALLS] wrapper with ID field |
| Functionary v3.1 | TAG_WITH_JSON | <function=X> format |
| Functionary v3.2 | Specialized | >>> recipient delimiter (dedicated handler) |
| Fireworks Firefunction v2 | TAG_WITH_JSON | Fireworks tool format |
| DeepSeek R1 Distill (Llama/Qwen) | Reasoning | Forced-open thinking |
| llama-cpp-deepseek-r1 | Reasoning | Forced-open thinking |
| Kimi-K2 / Kimi-K2-Instruct | JSON_NATIVE | JSON tools with special markers |
| Llama 3.1/3.2/3.3 | JSON_NATIVE | Standard Llama tool format |
| OpenAI GPT-OSS | Specialized | Channel-based (dedicated handler) |
| Apriel 1.5 | JSON_NATIVE | <tool_calls> wrapper with JSON array |
| Apriel 1.6 Thinker | Reasoning | Implicit reasoning start |
| Mistral Small 3.2 | JSON_NATIVE | [TOOL_CALLS]func[ARGS]{...} with call ID |
| Devstral | JSON_NATIVE | [TOOL_CALLS]func[ARGS]{...} without call ID |
| StepFun 3.5 Flash | TAG_WITH_TAGGED | <function=X><parameter=Y> format |
To support a new template format:
llama-debug-template-parser to verify markers are correctly extracted.workarounds vector in common/chat-diff-analyzer.cpp. Inspect the template source for a unique identifying substring.chat.cpp before the auto-parser block (as done for GPT-OSS, Functionary v3.2, and Ministral).add_generation_prompt=false vs true in common_chat_templates_apply_jinja, so it contains exactly what the template appends — avoiding false positives from prior conversation turns.per_call_start/end); others wrap the entire section (section_start/end). T2 (check_per_call_markers()) disambiguates by checking if the second call in a two-call output starts with the section marker.calculate_diff_split() iteratively adjusts prefix/suffix boundaries to avoid splitting <tag> or [marker] tokens, ensuring clean extraction.per_call_end may have been incorrectly set to include the call ID suffix. T7 clears per_call_end in this case.analyze_tools is only constructed (and all tool analysis phases run) when jinja_caps.supports_tool_calls is true. Within tool analysis, check_per_call_markers() (T2) only runs if jinja_caps.supports_parallel_tool_calls.analyze_arguments() Gating: Within tool analysis, A1 and A2 (argument name/value marker extraction) only run for TAG_WITH_TAGGED format. extract_argument_separator() and extract_args_markers() run for all non-JSON_NATIVE formats.analyze_tools concludes tool calling is supported but cannot determine the format, build_parser() logs an error and returns eps() (graceful degradation) rather than aborting.