architecture/networking/gateway-api-inference-extension.md
This document describes Istio's support for the Gateway API Inference Extension.
The Gateway API Inference Extension enables intelligent request routing to machine learning inference workloads. It allows dynamic endpoint selection using an Endpoint Picker Protocol (EPP) service, which can make intelligent decisions about which backend pod should handle each request based on factors like model availability, load, or request characteristics.
The InferencePool is a Kubernetes CRD from the inference.networking.k8s.io/v1 API group that represents a pool of inference model server endpoints. Key fields include:
selector: Label selector to identify model server pods in the pooltargetPorts: List of ports exposed by the model servers (supports multiple ports as of GIE v1.1.0)endpointPickerRef: Reference to an external service that provides endpoint selection logicExample:
apiVersion: inference.networking.k8s.io/v1
kind: InferencePool
metadata:
name: my-inference-pool
spec:
targetPorts:
- number: 8000
- number: 8001
selector:
matchLabels:
app: inference-workload
endpointPickerRef:
name: endpoint-picker-service
port:
number: 9002
The EPP is an external gRPC service that selects specific endpoints for requests. It uses Envoy's ext_proc filter for request interception and dynamic endpoint selection. The EPP service receives request headers and returns the selected endpoint via the x-endpoint header in the format <pod-ip>:<port>.
For each InferencePool, Istio automatically creates an internal "shadow" Service:
<pool-name>-ip-<hash> (e.g., test-pool-ip-a1b2c3d4)istio.io/inferencepool-name: Pool nameistio.io/inferencepool-extension-service: EPP service nameistio.io/inferencepool-extension-port: EPP service portistio.io/inferencepool-extension-failure-mode: Failure mode (FailOpen/FailClose)An HTTPRoute references an InferencePool as a backend:
backendRefs:
- group: inference.networking.k8s.io
kind: InferencePool
name: my-inference-pool
port: 80
The Gateway controller detects this and creates a shadow Service
During route conversion, the ext_proc filter is attached with EPP service details
At runtime, Envoy uses ext_proc to query the EPP service for endpoint selection
Requests are routed to the selected pod:port combination
The Gateway API Inference Extension is disabled by default. To enable it:
PILOT_ENABLE_GATEWAY_API=true (required)ENABLE_GATEWAY_API_INFERENCE_EXTENSION=true on istiodExample:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
values:
pilot:
env:
PILOT_ENABLE_GATEWAY_API: "true"
ENABLE_GATEWAY_API_INFERENCE_EXTENSION: "true"
The integration tests live in tests/integration/pilot/gie.
To run the GIE integration tests:
go test -tags=integ ./tests/integration/pilot/gie/... -v
The tests require:
This test verifies: