Auto-Parser Architecture

The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls.

Overview

The unified auto-parser uses a pure differential, compositional approach (inspired by the git diff algorithm) to analyze chat templates:

Core Philosophy:

Minimize Hardcoded Patterns: All markers extracted through template comparison (the only heuristic is JSON detection to distinguish JSON_NATIVE from tag-based formats)
Compositional Architecture: Separate analyzer structs for reasoning, content, and tools — each responsible for its own analysis and parser construction

Analysis + Parser Building in Two Steps:

autoparser::autoparser tmpl_analysis(tmpl) — runs all differential comparisons and populates the analysis structs
autoparser::peg_generator::generate_parser(tmpl, generation_params, tmpl_analysis) — uses the analysis to build a PEG parser and optional GBNF grammar

Data Structures

All structs are defined in common/chat-auto-parser.h.

Top-Level: `autoparser` (main analyzer and generator)

common/chat-auto-parser.h:367-388 — top-level analysis result aggregating jinja_caps, reasoning, content, and tools sub-analyses, plus preserved_tokens (union of all non-empty markers).

`analyze_reasoning`

common/chat-auto-parser.h:254-274 — reasoning analysis result: mode enum, start marker (e.g. <think>), and end marker (e.g. </think>).

`analyze_content`

common/chat-auto-parser.h:280-295 — content analysis result: mode enum, start/end markers, and requires_nonnull_content flag.

`analyze_tools` and its sub-structs

common/chat-auto-parser.h:176-194 — tool_format_analysis: mode enum, section_start/end, per_call_start/end, JSON field names (function_field, name_field, args_field, id_field, gen_id_field), and format flags (fun_name_is_key, tools_array_wrapped)
common/chat-auto-parser.h:196-200 — tool_function_analysis: name_prefix, name_suffix, close markers around function names
common/chat-auto-parser.h:202-210 — tool_arguments_analysis: start/end container markers, name_prefix/suffix, value_prefix/suffix, separator
common/chat-auto-parser.h:212-217 — tool_id_analysis: pos enum, prefix/suffix markers around call ID values
common/chat-auto-parser.h:301-361 — analyze_tools: aggregates the four sub-structs above

Enums

reasoning_mode: How the template handles reasoning/thinking blocks.

Value	Description
`NONE`	No reasoning markers detected
`TAG_BASED`	Tag-based: `<think>...</think>` (start can be empty for delimiter-style formats)
`TOOLS_ONLY`	Reasoning only appears in tool call responses, not plain content

Generation Prompt & Reasoning Prefill: Computed in common_chat_templates_apply_jinja before invoking either the specialized handlers or the auto-parser, by rendering the template twice — once with add_generation_prompt=false and once with add_generation_prompt=true — and storing the diff suffix as generation_params::generation_prompt. This string is propagated into common_chat_params::generation_prompt and common_chat_parser_params::generation_prompt.

The generation prompt is prepended to model output before PEG parsing via wrap_for_generation_prompt(). The portion before the reasoning start marker (if any) is prepended as a literal to ensure any boilerplate added by the template is consumed. The full string is also fed to the grammar sampler via llama_sampler_accept (stored in common_params_sampling::grammar_prefill), advancing the grammar past tokens already in the prompt. It is used to determine the reasoning budget sampler's initial state — COUNTING if the prefill tokens begin with the reasoning start sequence (but don't also contain the end sequence), IDLE otherwise.

grammar_prefill (common_params_sampling): The generation prompt string tokenized and accepted by the grammar sampler at init time. Only applied when grammar_external is false (i.e., the grammar was not set explicitly by the user).

Three outcomes for reasoning-prefill handling (in generate_parser()):

Start+end in generation prompt (e.g. <think></think>\n): the parser sees reasoning as opened and immediately closed; whitespace-only reasoning content is discarded.
Only start in generation prompt (e.g. <think>\n): the parser sees reasoning as already open.
Start marker present but not at the end (e.g. Apriel's <|begin_assistant|> followed by boilerplate): the marker is a template artifact; the start literal is cleared so reasoning uses delimiter-style (end-only). For templates that ignore add_generation_prompt (empty diff), the rendered data.prompt is used as fallback — but only for non-TOOLS_ONLY modes, since in TOOLS_ONLY the start tag is model-generated and may appear in prior conversation turns.

content_mode: How the template wraps assistant content.

Value	Description
`PLAIN`	No content markers
`ALWAYS_WRAPPED`	Content always wrapped: `<response>...</response>`
`WRAPPED_WITH_REASONING`	Content wrapped only when reasoning is present

tool_format: Classification of tool call structure.

Value	Description
`NONE`	No tool support detected
`JSON_NATIVE`	Pure JSON: `{"name": "X", "arguments": {...}}`
`TAG_WITH_JSON`	Tag-based with JSON args: `<function=X>{...}</function>`
`TAG_WITH_TAGGED`	Tag-based with tagged args: `<param=key>value</param>`

call_id_position: Where call IDs appear in tag-based formats.

Value	Description
`NONE`	No call ID support detected
`PRE_FUNC_NAME`	Before function name
`BETWEEN_FUNC_AND_ARGS`	Between function name and arguments
`POST_ARGS`	After arguments

Tool Calling Formats

JSON_NATIVE

Structure: The entire tool call (function name, arguments, values) is in JSON format. Optional enclosing tags around the section.

Detection: Function name appears inside a JSON structure (quotes preceded by { or :).

Examples:

Standard OpenAI-style:

json

<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"}}
</tool_call>

Mistral Nemo with array wrapper:

json

[TOOL_CALLS]
[{"name": "calculate", "arguments": {"expr": "2+2"}}]

Function name as JSON key (Apertus style):

json

{"get_weather": {"location": "Paris"}}

TAG_WITH_JSON

Structure: Function name is outside JSON, in tag attributes or XML-style tags. Arguments are a JSON object.

Detection: Function name not in JSON, but argument names appear in JSON context.

Examples:

Functionary v3.1:

xml

<function=get_weather>{"location": "Paris", "unit": "celsius"}</function>

MiniMax:

xml

<minimax:tool_call>
<tool_name>calculate</tool_name>
<arguments>{"expr": "2+2"}</arguments>
</minimax:tool_call>

TAG_WITH_TAGGED

Structure: Both function name and argument names are in XML-style tags. String values are unquoted; non-string values are JSON-formatted.

Detection: Neither function name nor argument names appear in a JSON context.

Examples:

Qwen/Hermes XML format:

xml

<function=get_weather>
<param=location>Paris</param>
<param=unit>celsius</param>
</function>

Mixed types:

xml

<function=calculate>
<param=expr>2+2</param>
<param=precision>2</param>
<param=options>{"round": true}</param>
</function>

String values (Paris, celsius, 2+2) are unquoted; options (object type) is JSON-formatted.

Analysis Flow

text

autoparser::autoparser(tmpl)
    |
    |-- Phase 1: analyze_reasoning(tmpl, jinja_caps.supports_tool_calls)
    |     |-- R1: compare_reasoning_presence()   — with/without reasoning_content field
    |     |-- R2: compare_thinking_enabled()     — enable_thinking=false vs true
    |     '-- R3: compare_reasoning_scope()      — reasoning+content vs reasoning+tools
    |           (only if supports_tool_calls)
    |
    |-- Phase 2: analyze_content(tmpl, reasoning)
    |     '-- C1: compares content-only vs tools output and content-only vs reasoning output
    |
    |-- Phase 3: analyze_tools(tmpl, jinja_caps, reasoning)
    |     (skipped entirely if !jinja_caps.supports_tool_calls)
    |     |
    |     |-- T1: analyze_tool_calls()           — no tools vs with tools; classifies format
    |     |         |-- JSON path → analyze_tool_call_format_json_native()
    |     |         '-- tag path → analyze_tool_call_format_non_json()
    |     |
    |     (if format != NONE and format != JSON_NATIVE:)
    |     |
    |     |-- T2: check_per_call_markers()       — 1 call vs 2 calls; moves section→per-call if needed
    |     |         (only if supports_parallel_tool_calls)
    |     |
    |     |-- T3: extract_function_markers()     — func_alpha vs func_beta; extracts name prefix/suffix/close
    |     |
    |     |-- T4: analyze_arguments()            — (TAG_WITH_TAGGED only)
    |     |         |-- A1: extract_argument_name_markers()   — arg_name_A vs arg_name_B
    |     |         '-- A2: extract_argument_value_markers()  — value "XXXX" vs "YYYY"
    |     |
    |     |-- T5: extract_argument_separator()   — 1 arg vs 2 args; finds separator between args
    |     |
    |     |-- T6: extract_args_markers()         — 0 args vs 1 arg; finds args container markers
    |     |
    |     '-- T7: extract_call_id_markers()      — call_id "call00001" vs "call99999"
    |
    '-- collect_preserved_tokens()               — union of all non-empty markers
    |
    '-- apply workarounds()                      — post-hoc patches for edge-case templates
    |
    v
autoparser (analysis result)
    |
    v
autoparser::peg_generator::generate_parser(tmpl, inputs, analysis)
    |-- analysis.build_parser(inputs)            — builds PEG parser arena
    |     |-- reasoning.build_parser(ctx)        — reasoning parser (mode-dependent)
    |     |-- content.build_parser(ctx)          — content parser (mode-dependent)
    |     '-- tools.build_parser(ctx)            — tool parser (dispatches by tool_format)
    |           |-- build_tool_parser_json_native()
    |           |-- build_tool_parser_tag_json()
    |           '-- build_tool_parser_tag_tagged()
    |
    |-- Build GBNF grammar (if tools present and trigger_marker non-empty)
    '-- Set grammar_triggers from section_start or per_call_start
    |
    v
common_chat_params (prompt, parser, grammar, triggers, preserved_tokens)

Entry Point

The auto-parser is invoked in common/chat.cpp:1280-1310 in common_chat_templates_apply_jinja. A few specialized templates are handled first (Ministral/Magistral Large 3, GPT-OSS with <|channel|>, Functionary v3.2 with >>>all), then the auto-parser handles everything else via autoparser::autoparser + peg_generator::generate_parser.

Algorithm Details

Core Mechanism: Differential Comparison

All analysis phases use the same factorized comparison function declared in common/chat-auto-parser-helpers.h:68:

cpp

compare_variants(tmpl, params_A, params_modifier)

This creates variant B by applying a modifier lambda to a copy of params_A, renders both through the template, and computes a diff_split (common/chat-auto-parser.h:28-37):

prefix — common prefix between A and B
suffix — common suffix between A and B
left — unique to variant A
right — unique to variant B

The diff is computed via calculate_diff_split(), which finds the longest-common-prefix and longest-common-suffix, then iteratively moves incomplete <...> or [...] markers from the prefix/suffix into left/right until stable (tag boundary fixing).

Text is segmentized into markers and non-marker fragments using segmentize_markers(), which splits on <...> and [...] boundaries.

Phase 1: Reasoning Analysis

R1 — compare_reasoning_presence(): Compares assistant message with vs without a reasoning_content field.

Searches diff.right (output with reasoning) for the reasoning content needle
Uses PEG parsers to find surrounding markers:
- If both pre/post markers found in diff.right → TAG_BASED
- If both found but post marker only in the full output B → TAG_BASED (template forces markers; handled via prefill)
- If only post marker found → TAG_BASED (delimiter-style, empty start)
Sets reasoning.start and reasoning.end

R2 — compare_thinking_enabled(): Compares enable_thinking=false vs true with a generation prompt.

Detects template-added reasoning markers: enable_thinking=true appends a non-empty marker → sets reasoning.start, mode = TAG_BASED
Handles the reverse case (enable_thinking=false appends the marker instead): extracts both start (from the preceding segment) and end markers; mode = TAG_BASED
The reasoning prefill (markers added by the template) is later extracted in common_chat_templates_apply_jinja and prepended to model output before parsing

R3 — compare_reasoning_scope(): Compares assistant message with reasoning+text-content vs reasoning+tool-calls.

Only runs if jinja_caps.supports_tool_calls
Detects TOOLS_ONLY: reasoning content present in B (with tools) but not in A (with text content)
Extracts reasoning markers from the tool call output using PEG parsers

Phase 2: Content Analysis

C1: Two comparisons in the analyze_content constructor:

Comparison 1: content-only output vs tool-call output → diff_tools
Comparison 2: content-only output vs reasoning+empty-content output → diff_reasoning

Classification logic:

PLAIN: diff_tools.left equals the response string (content is the entire diff, no wrapper)
ALWAYS_WRAPPED: markers found surrounding the content text in pure_content → extracts start/end

Phase 3: Tool Call Analysis

T1 — analyze_tool_calls(): Compares no-tools vs with-tools output.

Extracts the tool call section as diff.right
Calls analyze_tool_call_format() which first strips reasoning markers from the haystack, then:
- Calls in_json_haystack() for both function name and argument name needles
- in_json_haystack() uses a PEG parser to check whether the needle appears in a JSON context (preceded by { or : with surrounding quotes)
- If function name is in JSON → JSON_NATIVE → analyze_tool_call_format_json_native()
- If function name not in JSON, arg name is in JSON → TAG_WITH_JSON
- If neither in JSON → TAG_WITH_TAGGED
- analyze_tool_call_format_json_native(): parses the JSON object, matches field values to needles to populate name_field, args_field, id_field, gen_id_field; detects tools_array_wrapped; extracts section_start/section_end
- analyze_tool_call_format_non_json(): uses PEG parsers on the haystack to find up to two opening markers (section + per-call) then up to two closing markers

T2 — check_per_call_markers(): Compares 1 call vs 2 calls.

Computes a secondary diff of the second call portion vs the common suffix
If the second call content starts with section_start → the section marker is actually per-call → moves section_start/end to per_call_start/end and clears the section markers

T3 — extract_function_markers(): Compares function name FUN_FIRST vs FUN_SECOND (two different named functions).

Finds where the function name appears in diff.left
Extracts function.name_prefix from the common prefix up to the function marker, and function.name_suffix from after the name up to the next marker
Extends name_suffix into diff.suffix (to the first marker for TAG_WITH_TAGGED; to the first { or [ for TAG_WITH_JSON)
Extracts function.close from after the last argument value up to the per-call/section end marker

T4 — analyze_arguments() (TAG_WITH_TAGGED only):

A1 extract_argument_name_markers(): Compares arg_name_A vs arg_name_B (two different argument names).
- Finds shared surrounding structure → arguments.name_prefix, arguments.name_suffix
A2 extract_argument_value_markers(): Compares argument value "XXXX" vs "YYYY" (same arg, different value).
- Finds markers surrounding the value → arguments.value_prefix, arguments.value_suffix

T5 — extract_argument_separator(): Compares 1 argument vs 2 arguments (same function).

Uses until_common_prefix(diff.right, ARG_FIRST, ARG_SECOND) to find what separates the two argument blocks

T6 — extract_args_markers(): Compares 0 arguments vs 1 argument.

Uses until_common_prefix() and after_common_suffix() with the empty and single-arg JSON strings as anchors to find container markers (arguments.start, arguments.end)

T7 — extract_call_id_markers(): Compares call IDs "call00001" vs "call99999".

Determines whether function name appears in diff.prefix or diff.suffix to classify position:
- Function name in prefix only → BETWEEN_FUNC_AND_ARGS or POST_ARGS (further distinguished by where { appears)
- Function name in suffix only → PRE_FUNC_NAME
Extracts call_id.prefix and call_id.suffix markers around the call ID value
Clears per_call_end if it incorrectly incorporated the call ID suffix

Workarounds

A workaround array in common/chat-diff-analyzer.cpp applies post-hoc patches after analysis. Each workaround is a lambda that inspects the template source and overrides analysis results. Current workarounds:

Old Qwen/DeepSeek thinking templates — source contains content.split('</think>') but not <SPECIAL_12>: sets reasoning.mode = TAG_BASED with <think>/</think> markers if no reasoning was detected
Granite 3.3 — source contains specific "Write your thoughts" text: forces TAG_BASED reasoning with <think>/</think> and WRAPPED_WITH_REASONING content with <response>/</response>
Cohere Command R+ — source contains <|CHATBOT_TOKEN|>: sets ALWAYS_WRAPPED content mode if no content start is already set
Functionary 3.1 — source contains set has_code_interpreter: forces PLAIN content, specific per_call_start/end, clears preserved tokens to only keep Functionary-specific markers
DeepSeek-R1-Distill-Qwen — source contains tool▁calls▁begin markers: overrides tool section/per-call markers with the correct Unicode block characters

Parser Building

Each analyzer struct (analyze_reasoning, analyze_content, analyze_tools) implements build_parser(parser_build_context&). They share a parser_build_context that carries the PEG builder, inference inputs, the pre-built reasoning parser, and a pointer to the content analyzer.

Reasoning Parser (`analyze_reasoning::build_parser`)

Mode	Parser
Not extracting reasoning	`eps()`
`TAG_BASED` or `TOOLS_ONLY` (non-empty start)	`optional(start + reasoning(until(end)) + end + space())`
`TAG_BASED` or `TOOLS_ONLY` (empty start)	`optional(reasoning(until(end)) + end + space())` — delimiter-style

Note: The start marker may be empty either because the analyzer detected delimiter-style reasoning, or because generate_parser() cleared a template artifact start marker (see Generation Prompt & Reasoning Prefill above). Whitespace-only reasoning content (e.g. from a <think></think> prefill) is discarded by the mapper.

Content Parser (`analyze_content::build_parser`)

Condition	Parser
`json_schema` present	`reasoning + space() + content(schema(json(), "response-format", ...)) + end()`
Tools present	Dispatches to `analyze_tools::build_parser()`
`ALWAYS_WRAPPED` with reasoning	`reasoning + start + content(until(end)) + end + end()`
`ALWAYS_WRAPPED` without reasoning	`content(until(start)) + start + content(until(end)) + end + end()`
Default (PLAIN)	`reasoning + content(rest()) + end()`

Tool Parsers (`analyze_tools::build_parser`)

Dispatches by format.mode:

build_tool_parser_json_native(): Calls p.standard_json_tools() which internally dispatches to:

build_json_tools_function_is_key() — function name is the JSON key: {"get_weather": {...}}
build_json_tools_nested_keys() — nested: {"function": {"name": "X", "arguments": {...}}}
build_json_tools_flat_keys() — flat: {"name": "X", "arguments": {...}}

Handles content wrappers, array wrapping (tools_array_wrapped), parallel calls, and parameter_order.

build_tool_parser_tag_json(): For each tool function:

text

tool_open(name_prefix + tool_name(literal(name)) + name_suffix) +
    call_id_section +
    tool_args(schema(json(), tool_schema))
  [+ function.close if non-empty]

Wrapped in per-call markers (with optional parallel call repetition) then optionally in section markers.

build_tool_parser_tag_tagged(): For each tool function, builds one parser per argument:

String types: tool_arg_string_value(schema(until(value_suffix), ...))
JSON types: tool_arg_json_value(schema(json(), ...))
Required args are plain; optional args wrapped in optional()
Arguments joined with space() between consecutive parsers

For closing: uses function.close if present; otherwise uses peek(per_call_end) to avoid premature close during partial streaming; falls back to tool_close(space()) to trigger mapper callbacks.

All three tool parsers return:

text

reasoning + optional(content(until(trigger_marker))) + tool_calls + end()

Each returned parser is wrapped by wrap_for_generation_prompt(), which prepends a literal for any boilerplate prefix of the generation prompt (the portion before the reasoning start marker).

Mapper

common_chat_peg_mapper maps PEG parse results (AST nodes) into common_chat_msg structures. Key design:

Buffered arguments: Before tool_name is known, argument text goes to args_buffer; once the name is set, the buffer is flushed to current_tool->arguments
args_target(): Returns a reference to whichever destination is currently active (buffer or tool args), eliminating branching
closing_quote_pending: Tracks whether a closing " needs to be appended when a string argument value is finalized (for schema-declared string types in tagged format)
Whitespace-only reasoning: Reasoning content that consists entirely of whitespace (e.g. from a <think></think> prefill) is cleared so the message shows no reasoning
Brace auto-closing: At tool close, unclosed { braces are closed automatically

Files

File	Purpose
`common/chat-auto-parser.h`	All analysis structs, enums, `autoparser`, `peg_generator`, `generation_params`
`common/chat-auto-parser-generator.cpp`	Parser generator: `generate_parser()` and `build_parser()` methods
`common/chat-diff-analyzer.cpp`	Differential analysis implementation and workarounds
`common/chat-auto-parser-helpers.h/cpp`	`calculate_diff_split()`, `segmentize_markers()`, `compare_variants()`,
	`wrap_for_generation_prompt()`, string helpers
`common/chat-peg-parser.h/cpp`	`common_chat_peg_builder`, `common_chat_peg_mapper`, and helpers
`common/chat.cpp`	Entry point: `common_chat_templates_apply_jinja()`
`tools/parser/debug-template-parser.cpp`	Debug tool for template analysis
`tools/parser/template-analysis.cpp`	Template analysis tool

Testing & Debugging

Debug Tools

Template Debugger: tools/parser/debug-template-parser.cpp

Usage: ./bin/llama-debug-template-parser path/to/template.jinja
Shows detected format, markers, generated parser, and GBNF grammar

Template Analysis: tools/parser/template-analysis.cpp

Usage: ./bin/llama-template-analysis path/to/template.jinja

Debug Logging: Enable with LLAMA_LOG_VERBOSITY=2

Shows detailed analysis steps, pattern extraction results, and generated parser structure

PEG Test Builder: Fluent API for creating test cases — see tests/test-chat.cpp:947-1043. Example usage:

cpp

auto tst = peg_tester("models/templates/Template.jinja");
tst.test("input text")
   .reasoning_format(COMMON_REASONING_FORMAT_AUTO)
   .tools({tool_json})
   .parallel_tool_calls(true)
   .enable_thinking(true)
   .expect(expected_message)
   .run();

Tested Templates

The following templates have active tests in tests/test-chat.cpp:

Template	Format	Notes
Ministral-3-14B-Reasoning	Reasoning	`[THINK]...[/THINK]` tags (specialized handler)
NVIDIA-Nemotron-3-Nano-30B	TAG_WITH_TAGGED	Reasoning + tools
CohereForAI Command-R7B	JSON_NATIVE	`<\|START_THINKING\|>`/`<\|START_RESPONSE\|>` markers
Google Gemma 2 2B	Content only	No tool support
Qwen-QwQ-32B	Reasoning	Forced-open thinking
NousResearch Hermes 2 Pro	JSON_NATIVE	`<tool_call>` wrapper
IBM Granite 3.3	JSON_NATIVE	`<think></think>` + `<response></response>`
ByteDance Seed-OSS	TAG_WITH_TAGGED	Custom `<seed:think>` and `<seed:tool_call>` tags
Qwen3-Coder	TAG_WITH_TAGGED	XML-style tool format
DeepSeek V3.1	JSON_NATIVE	Forced thinking mode
GLM-4.6	TAG_WITH_TAGGED	`<tool_call>name\n<arg_key>...<arg_value>...` format
GLM-4.7-Flash	TAG_WITH_TAGGED	Updated GLM format
Kimi-K2-Thinking	JSON_NATIVE	Reasoning + JSON tools
Apertus-8B-Instruct	JSON_NATIVE	Function name as JSON key
MiniMax-M2	TAG_WITH_JSON	XML invoke with JSON args
NVIDIA-Nemotron-Nano-v2	JSON_NATIVE	`<TOOLCALL>` wrapper (nested)
CohereForAI Command-R Plus	JSON_NATIVE	Markdown code block format
Mistral-Nemo-Instruct-2407	JSON_NATIVE	`[TOOL_CALLS]` wrapper with ID field
Functionary v3.1	TAG_WITH_JSON	`<function=X>` format
Functionary v3.2	Specialized	`>>>` recipient delimiter (dedicated handler)
Fireworks Firefunction v2	TAG_WITH_JSON	Fireworks tool format
DeepSeek R1 Distill (Llama/Qwen)	Reasoning	Forced-open thinking
llama-cpp-deepseek-r1	Reasoning	Forced-open thinking
Kimi-K2 / Kimi-K2-Instruct	JSON_NATIVE	JSON tools with special markers
Llama 3.1/3.2/3.3	JSON_NATIVE	Standard Llama tool format
OpenAI GPT-OSS	Specialized	Channel-based (dedicated handler)
Apriel 1.5	JSON_NATIVE	`<tool_calls>` wrapper with JSON array
Apriel 1.6 Thinker	Reasoning	Implicit reasoning start
Mistral Small 3.2	JSON_NATIVE	`[TOOL_CALLS]func[ARGS]{...}` with call ID
Devstral	JSON_NATIVE	`[TOOL_CALLS]func[ARGS]{...}` without call ID
StepFun 3.5 Flash	TAG_WITH_TAGGED	`<function=X><parameter=Y>` format

Adding Support for New Templates

To support a new template format:

If it follows standard patterns — The auto-parser should detect it automatically. Run llama-debug-template-parser to verify markers are correctly extracted.
If differential analysis extracts incorrect markers — Add a workaround lambda to the workarounds vector in common/chat-diff-analyzer.cpp. Inspect the template source for a unique identifying substring.
If it needs fundamentally different handling — Add a dedicated handler function in chat.cpp before the auto-parser block (as done for GPT-OSS, Functionary v3.2, and Ministral).

Edge Cases and Quirks

Generation Prompt & Reasoning Prefill: The generation prompt is extracted by diffing add_generation_prompt=false vs true in common_chat_templates_apply_jinja, so it contains exactly what the template appends — avoiding false positives from prior conversation turns.
Per-Call vs Per-Section Markers: Some templates wrap each tool call individually (per_call_start/end); others wrap the entire section (section_start/end). T2 (check_per_call_markers()) disambiguates by checking if the second call in a two-call output starts with the section marker.
Tag Boundary Fixing: calculate_diff_split() iteratively adjusts prefix/suffix boundaries to avoid splitting <tag> or [marker] tokens, ensuring clean extraction.
Call ID Side Effects: When a call ID is detected, per_call_end may have been incorrectly set to include the call ID suffix. T7 clears per_call_end in this case.
Tool Analysis Gating: analyze_tools is only constructed (and all tool analysis phases run) when jinja_caps.supports_tool_calls is true. Within tool analysis, check_per_call_markers() (T2) only runs if jinja_caps.supports_parallel_tool_calls.
analyze_arguments() Gating: Within tool analysis, A1 and A2 (argument name/value marker extraction) only run for TAG_WITH_TAGGED format. extract_argument_separator() and extract_args_markers() run for all non-JSON_NATIVE formats.
Undetected Tool Format: If analyze_tools concludes tool calling is supported but cannot determine the format, build_parser() logs an error and returns eps() (graceful degradation) rather than aborting.

Auto-Parser Architecture

Auto-Parser Architecture

Overview

Data Structures

Top-Level: autoparser (main analyzer and generator)

analyze_reasoning

analyze_content

analyze_tools and its sub-structs

Enums

Tool Calling Formats

JSON_NATIVE

TAG_WITH_JSON

TAG_WITH_TAGGED

Analysis Flow

Entry Point

Algorithm Details

Core Mechanism: Differential Comparison

Phase 1: Reasoning Analysis

Phase 2: Content Analysis

Phase 3: Tool Call Analysis

Workarounds

Parser Building

Reasoning Parser (analyze_reasoning::build_parser)

Content Parser (analyze_content::build_parser)

Tool Parsers (analyze_tools::build_parser)

Mapper

Files

Testing & Debugging

Debug Tools

Tested Templates

Adding Support for New Templates

Edge Cases and Quirks

Top-Level: `autoparser` (main analyzer and generator)

`analyze_reasoning`

`analyze_content`

`analyze_tools` and its sub-structs

Reasoning Parser (`analyze_reasoning::build_parser`)

Content Parser (`analyze_content::build_parser`)

Tool Parsers (`analyze_tools::build_parser`)