docs/middleware/README.md
Hermes middleware is the behavior-changing companion to observer hooks. Observer hooks report what happened. Middleware can change what happens by rewriting a request before execution or by wrapping the execution callback itself.
This contract is intentionally backend-neutral. A plugin can use it for local policy, request shaping, tracing, adaptive routing, cache control, sandbox selection, or handoff to runtimes such as NeMo Relay without changing Hermes' planner, model provider adapters, tool registry, memory, or CLI UX.
With middleware enabled, plugins can:
Plugins register middleware from register(ctx):
def register(ctx):
ctx.register_middleware("llm_request", on_llm_request)
ctx.register_middleware("llm_execution", on_llm_execution)
ctx.register_middleware("tool_request", on_tool_request)
ctx.register_middleware("tool_execution", on_tool_execution)
Every middleware callback receives:
telemetry_schema_version: currently hermes.observer.v1middleware_schema_version: currently hermes.middleware.v1session_id, task_id, turn_id,
api_request_id, provider, model, api_mode, tool_name, and
tool_call_id when applicable.Supported middleware kinds:
| Kind | Payload | Return shape | Purpose |
|---|---|---|---|
llm_request | request, original_request | {"request": {...}} | Replace effective provider kwargs before provider execution. |
tool_request | tool_name, args, original_args | {"args": {...}} | Replace effective tool args before hooks, guardrails, approvals, and execution. |
llm_execution | request, original_request, next_call | Any provider response | Wrap or replace the actual provider call. |
tool_execution | tool_name, args, original_args, next_call | Any tool result | Wrap or replace the actual tool call. |
Request middleware can return optional trace fields:
return {
"request": updated_request,
"source": "my-plugin",
"reason": "selected fallback model",
}
Hermes stores those trace entries in later observer hook payloads as
middleware_trace.
Execution middleware receives a next_call callback. Call it to continue the
chain:
def on_tool_execution(**kwargs):
result = kwargs["next_call"](kwargs["args"])
return result
If multiple plugins register the same execution middleware kind, Hermes runs them as a nested chain in registration order. Middleware failures are fail-open: Hermes logs a warning and continues with the next middleware or the base runtime path.
For each provider request, Hermes applies middleware in this order:
llm_request middleware.pre_api_request observer hooks with the effective request.llm_execution middleware.post_api_request or api_request_error observer hooks.Request middleware sees the full provider kwargs, including messages or
Responses API input, model settings, tool definitions, stream options, and
provider-specific options. Execution middleware receives the same effective
request plus next_call.
For each tool call, Hermes applies middleware in this order:
tool_request middleware.tool_execution middleware.post_tool_call observer hooks.transform_tool_result hooks before the result is appended back into
conversation context.Tool request middleware runs before approval checks. Use it carefully: a rewritten path, command, or URL is the value downstream policy will evaluate.
Middleware only runs for enabled plugins. For a bundled plugin:
hermes plugins enable <plugin-name>
For isolated local testing, use one HERMES_HOME for plugin enablement and the
agent run:
export HERMES_HOME=/tmp/hermes-middleware-test
mkdir -p "$HERMES_HOME"
hermes plugins enable <plugin-name>
hermes chat --query 'Reply exactly ok'
For source checkouts, prefer the source command so the runtime sees plugins and middleware from the working tree:
uv sync
uv run hermes plugins enable <plugin-name>
uv run hermes chat --query 'Reply exactly ok'
The examples below are intentionally small. They show the middleware contract shape without depending on NeMo Relay.
This plugin tags provider requests and records a middleware trace entry:
def register(ctx):
ctx.register_middleware("llm_request", tag_llm_request)
def tag_llm_request(**kwargs):
request = dict(kwargs["request"])
extra_body = dict(request.get("extra_body") or {})
extra_body.setdefault("metadata", {})["hermes_middleware_demo"] = True
request["extra_body"] = extra_body
return {
"request": request,
"source": "middleware-demo",
"reason": "tagged provider request",
}
The effective request is passed to pre_api_request, provider execution, and
post_api_request.
This plugin constrains terminal calls to a known working directory:
def register(ctx):
ctx.register_middleware("tool_request", normalize_terminal_workdir)
def normalize_terminal_workdir(**kwargs):
if kwargs.get("tool_name") != "terminal":
return None
args = dict(kwargs["args"])
args.setdefault("workdir", "/tmp/hermes-middleware-demo")
return {
"args": args,
"source": "middleware-demo",
"reason": "defaulted terminal workdir",
}
Because this runs before hooks and approvals, downstream telemetry and policy
observe the rewritten workdir.
This plugin wraps the provider call and preserves the raw provider response:
import time
def register(ctx):
ctx.register_middleware("llm_execution", time_llm_execution)
def time_llm_execution(**kwargs):
started = time.monotonic()
response = kwargs["next_call"](kwargs["request"])
elapsed_ms = int((time.monotonic() - started) * 1000)
print(f"llm_execution elapsed_ms={elapsed_ms}")
return response
Return the same response shape Hermes expects from the provider adapter. Do not wrap the response in a plugin-specific envelope unless the rest of the runtime expects that envelope.
This plugin wraps tool execution while preserving the tool result:
def register(ctx):
ctx.register_middleware("tool_execution", annotate_tool_execution)
def annotate_tool_execution(**kwargs):
result = kwargs["next_call"](kwargs["args"])
# Metrics, logging, or external routing can happen here.
return result
Execution middleware may call next_call(modified_args) to pass a changed
payload to later middleware and the base tool dispatcher.
Plugin-specific examples should live with the plugin that owns the behavior.
For NeMo Relay adaptive execution middleware, see
plugins/observability/nemo_relay/README.md.
next_call(...) exactly once unless it is
intentionally short-circuiting execution.next_call(...), Hermes treats
that as middleware failure and continues with the remaining middleware chain
and base execution.next_call(...) successfully and then raises
during post-processing, Hermes preserves the downstream result and does not
run the provider or tool a second time.None result.