Triton Configuration - Flink

Key	Default	Type	Description

auth-token

compression

| (none) | String | Compression algorithm for request body. Currently only gzip is supported. When enabled, the request body will be compressed to reduce network bandwidth. | |

custom-headers

endpoint

| (none) | String | Full URL of the Triton Inference Server endpoint, e.g., https://triton-server:8000/v2/models. Both HTTP and HTTPS are supported; HTTPS is recommended for production. | |

flatten-batch-dim

| false | Boolean | Whether to flatten the batch dimension for array inputs. When true, shape [1,N] becomes [N]. Defaults to false. | |

model-name

model-version

priority

sequence-end

| false | Boolean | Whether this request marks the end of a sequence for stateful models. When true, Triton will release the model's state after processing this request. See Triton Stateful Models for more details. | |

sequence-id

| (none) | String | Sequence ID for stateful models. A sequence represents a series of inference requests that must be routed to the same model instance to maintain state across requests (e.g., for RNN/LSTM models). See Triton Stateful Models for more details. | |

sequence-start

| false | Boolean | Whether this request marks the start of a new sequence for stateful models. When true, Triton will initialize the model's state before processing this request. See Triton Stateful Models for more details. | |

timeout

| 30 s | Duration | HTTP request timeout (connect + read + write). This applies per individual request and is separate from Flink's async timeout. Defaults to 30 seconds. |