.agents/llama-cpp-backend.md
The llama.cpp backend (backend/cpp/llama-cpp/grpc-server.cpp) is a gRPC adaptation of the upstream HTTP server (llama.cpp/tools/server/server.cpp). It uses the same underlying server infrastructure from llama.cpp/tools/server/server-context.cpp.
make backends/llama-cppbackend/cpp/llama-cpp/Makefile for build configurationserver-context.cpp, server-task.cpp, server-queue.cpp, server-common.cppWhen fixing compilation errors after upstream changes:
server.cpp (HTTP server) handles the same changemake backends/llama-cppBackendServiceImpl class with gRPC service methodsserver_routes with HTTP handlersserver_context and task queue infrastructureLoadModel, Predict, PredictStream, Embedding, Rerank, TokenizeString, GetMetrics, HealthWhen working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
llama.cpp/common/chat-parser-xml-toolcall.h for xml_tool_call_format struct changesllama.cpp/common/chat-parser-xml-toolcall.cpp for parsing algorithm updatesllama.cpp/common/chat-parser.cpp for new XML format presets (search for xml_tool_call_format form)llama.cpp/common/chat.h for COMMON_CHAT_FORMAT_* enum values that use XML parsing:
COMMON_CHAT_FORMAT_GLM_4_5COMMON_CHAT_FORMAT_MINIMAX_M2COMMON_CHAT_FORMAT_KIMI_K2COMMON_CHAT_FORMAT_QWEN3_CODER_XMLCOMMON_CHAT_FORMAT_APRIEL_1_5COMMON_CHAT_FORMAT_XIAOMI_MIMOAlways check llama.cpp for new model configuration options that should be supported in LocalAI:
llama.cpp/tools/server/server-context.cpp for new parametersllama.cpp/common/chat.h for common_chat_params struct changesllama.cpp/tools/server/server.cpp for command-line argument changesctx_shift - Context shifting supportparallel_tool_calls - Parallel tool callingreasoning_format - Reasoning format optionsllama.cpp/common/chat-parser-xml-toolcall.h - Format definitionsllama.cpp/common/chat-parser-xml-toolcall.cpp - Parsing logicllama.cpp/common/chat-parser.cpp - Format presets and model-specific handlersllama.cpp/common/chat.h - Format enums and parameter structuresllama.cpp/tools/server/server-context.cpp - Server configuration options