docs/rfc/0014-router-hot-path-efficiency.md
The router constructs a new http.Transport per proxied request (pkg/router/transport.go:137 via getDefaultTransport() at transport.go:374-391), which silently defeats HTTP keep-alive entirely: every request to a function pod pays a fresh TCP dial, the configured MaxIdleConns: 100 is dead config, and the otelhttp wrapper is rebuilt per attempt too (transport.go:276).
This RFC fixes that bug-class finding with one shared, properly sized transport; puts the per-request handler path on an allocation diet (pre-built loggers, per-route proxy policy); and removes the functionReferenceResolver refCache — a TTL'd cache of the Manager informer cache that sits off the hot path, costs two goroutines plus an O(n) copy-walk per Function reconcile, and whose staleness class is already mitigated elsewhere.
Verified against main:
RoundTrip calls getDefaultTransport() (value receiver — copies the whole ~15-field struct) per request; the comment says it exists "to prevent the value of http.DefaultTransport from being changed by goroutines" — a safety choice that created the bug.
Per-attempt code then mutates transport.DialContext with Dialer{Timeout: executingTimeout, KeepAlive: keepAliveTime} (transport.go:248-251).
Function pods are plain HTTP :8888, so the per-request cost is a TCP handshake + slow-start — pure waste under steady load, most visible at high RPS and larger responses.pkg/router/functionHandler.go:40-156): a RetryingRoundTripper literal conflating immutable per-route config (resolver, tapper, trigger, params) with per-request state (serviceURL, tapURL, release, retryCounter — mutated at transport.go:191-193,256); logger.WithName("roundtripper") per request; resolveProxyPolicy per request (a pure function of fn + timeout); an httputil.ReverseProxy + closures per request; a metrics+tap goroutine (two spawn sites: functionHandler.go:110-115 and the error handler).pkg/router/functionReferenceResolver.go): keyed (namespace, triggerName, triggerRV) with a 1-minute TTL; each pkg/cache instance runs its own actor + expiry goroutines; resolve is called only from buildMuxes — results are closed into handlers at build time, so the cache is not on the request path at all.
Its known staleness class (trigger-RV keying misses function updates) is already mitigated by resolver_executor.currentFunction re-reading before any specialization.
invalidateForFunction does a full Copy() map walk on every Function reconcile (hook at pkg/router/reconciler.go:117).pkg/cache removal (functionServiceMap still uses it for the legacy/fallback address path; rationalization direction noted only).ReverseProxy in this RFC (designed below, explicitly deferred; interlocks with RFC-0013's handler reuse).One *http.Transport per router process (plus a once-wrapped otelhttp.NewTransport sibling for the non-websocket path), built where tsRoundTripperParams is constructed and stored on it.
disableKeepAlive is process-wide config, so a single transport configured from it at startup suffices — no per-variant pool.
The critical constraint: the per-attempt Dialer.Timeout = executingTimeout is not just a timeout, it is the backoff-scaled fast-retry ladder for cold pods — a not-yet-listening pod must fail the dial quickly so the loop re-resolves.
It is preserved exactly via a context value read by a single shared DialContext:
transport.DialContext = func(ctx context.Context, network, addr string) (net.Conn, error) {
if d, ok := ctx.Value(dialTimeoutKey{}).(time.Duration); ok {
var cancel context.CancelFunc
ctx, cancel = context.WithTimeoutCause(ctx, d, errDialLadderTimeout)
defer cancel()
}
return sharedDialer.DialContext(ctx, network, addr) // Dialer{KeepAlive: params.keepAliveTime}
}
RoundTrip stashes the attempt's executingTimeout into the per-attempt request context; a pooled-conn hit skips the dial entirely (correct — there is nothing to time out).
Classification is unchanged: a ctx-deadline dial failure is *net.OpError{Op: "dial"} with Timeout() == true — identical through network.Adapter (pkg/error/network/error.go) to today's Dialer.Timeout path (which net implements as a ctx deadline internally), and transport.RoundTrip is called directly so there is no *url.Error wrapping.
A pin test makes this an invariant (see Verification).
Sizing (the current MaxIdleConns: 100 never mattered because the transport never survived a request):
MaxIdleConnsPerHost: new env ROUTER_ROUND_TRIP_MAX_IDLE_CONNS_PER_HOST, default 32 — each poolmgr pod is its own host (ip:8888), and the Go default of 2 would throttle per-pod reuse well below requestsPerPod ceilings.MaxIdleConns: 1024 (explicit bound for fd hygiene).IdleConnTimeout: 90s → 30s, shorter than the poolmgr idle-reap window (120s), bounding how long a pooled conn can outlive its pod.Pooled conns vs dead pods is the one real behavioral shift: today a dead pod fails at dial (refused/timeout → quarantine/retry ladder); with pooling, a stale conn fails at write/read instead.
Graceful reaps send FIN, which evicts the conn from the pool before reuse; the residual race is covered by Go's automatic retry of replayable requests on a reused-conn failure; a non-replayable POST surfaces the error to the caller (a 500 via the proxy error handler, logged with a distinct 'pooled connection failed mid-request' line; bare EOF is included in that class — Go deliberately does not retry possibly-executed requests) — documented and pinned by unit tests.
The quarantine path is unaffected (it keys on dial errors, and a write-failure on a reused conn correctly is not one).
disableKeepAlive — which today is nearly meaningless — becomes the documented escape hatch back to per-request connections.
Computed once in buildMuxes and stored on the route's functionHandler:
logger.WithName("roundtripper") moves out of the request path).proxyPolicy + funcTimeout (resolveProxyPolicy is a pure function; the map covers the canary case, where fh.function is selected per request from the weight distribution).Kept deliberately per-request:
RetryingRoundTripper itself — it is the per-request state (one small allocation; sync.Pool rejected: the streaming-deferred settle and closeContextFunc lifetimes make pooling error-prone for negligible gain).ReverseProxy + closures — a per-route proxy requires carrying per-request state through the request context so shared ModifyResponse/ErrorHandler can find it; the win is small next to the transport fix and the risk touches the streaming settle block, the most delicate code in the router.
The context-key design is recorded here; implementation is deferred and revisited alongside RFC-0013's handler reuse.An optional roundTripperConfig field-regrouping makes the immutable/mutable boundary explicit, as a refactor not a behavior change.
buildMuxes resolves directly against the informer cache: resolve() stays as a thin uncached dispatch on FunctionReference.Type (smallest diff; resolveResult shape unchanged), and the following are deleted:
refCache field and its MakeCache construction (−2 goroutines),invalidateForFunction and its Copy() walk,reconciler.go:117 (the Function reconcile still triggers the rebuild — that is exactly what makes the cache redundant).Behavior delta: N informer Gets per rebuild instead of cache hits — in-memory map reads, negligible; canary weight lists are rebuilt per rebuild with identical results; the up-to-1-minute stale-snapshot window the cache could serve is gone outright (and was already moot for specialization via currentFunction).
| Knob | Default | Notes |
|---|---|---|
ROUTER_ROUND_TRIP_MAX_IDLE_CONNS_PER_HOST | 32 | new; tuning, not gating |
ROUTER_ROUND_TRIP_DISABLE_KEEP_ALIVE | false | pre-existing; becomes the meaningful escape hatch |
IdleConnTimeout | 30s | internal constant, rationale documented |
No new feature gate: the change is internal, and the keep-alive escape hatch already exists.
disableKeepAlive semantics upgrade called out in chart docs.Phase 0 — baseline harness.
Alloc benchmarks (BenchmarkHandlerClassic, BenchmarkRoundTripWarm against a fake resolver + httptest backend, b.ReportAllocs) plus a dial-counting listener wrapper; capture the k6 warm-path baseline (as shipped: the before/after evidence is recorded on #3491).
No production change — lands first so phases 1–2 have honest before/after deltas.
Phase 1 — shared transport (the bug fix, isolated).
Build transport + otel wrapper once; context-value dial ladder; delete getDefaultTransport; sizing knobs; classification pin test; chart-docs note for disableKeepAlive.
Phase 2 — handler hoists. Per-route logger; per-(route, backend-UID) policy/timeout map; optional config/state field regrouping. Benchmarks show the allocs/op delta.
Phase 3 — refCache removal. As designed; independent of phases 1–2 (sequenced last to keep perf-sensitive diffs first).
Deferred (recorded, not scheduled): per-route ReverseProxy via context-carried per-request state — re-evaluate with RFC-0013.
MaxIdleConnsPerHost dials after warmup (today: exactly N).go test -bench -benchmem — allocs/op strictly reduced after phases 1–2; numbers recorded on the implementing PR.transport_test.go retry matrix, settle dispatch matrix, streaming release tests, functionHandler_test.go, canary tests pass unchanged (the only deletion is getDefaultTransport's own coverage).192.0.2.1) with a 1ms ladder value asserts IsDialError() && IsTimeoutError().getDefaultTransport allocation site gone.sync.Pool for the round tripper — lifetime hazards (streaming settle, closeContextFunc) for negligible gain.ReverseProxy now — deferred (risk concentrated in the streaming settle block; small residual win).MaxIdleConnsPerHost derive from poolmgr requestsPerPod ceilings instead of a flat default?IdleConnTimeout right once RFC-0002's drain-aware reaping (unlabel → grace → delete) is considered — should it key off the drain grace floor (30s) explicitly?disableKeepAlive escape hatch.All four phases landed as specced, with these additions from the pre-merge review:
isStaleConnErr: write/read OpErrors, bare EOF, and net/http's "server closed idle connection") — deliberately not quarantined.router proxy transport configured), the operator's verification point for the keep-alive escape hatch.ROUTER_ROUND_TRIP_DISABLE_KEEP_ALIVE used | default true, and sprig's default treats false as empty — every install rendered "true", making keep-alive impossible to enable (masked until this RFC because the per-request transport made the setting moot).Measured: warm RoundTrip 253µs → 89µs (−65%), 267 → 165 allocs/op; e2e kind main-vs-branch k6: median −11.3%, p99 −10.8%, throughput +13.6%, max latency 185ms → 28.7ms (the dial tail eliminated); idle goroutines 87 → 86; CI pprof confirms the per-request transport allocation sites are gone.