Features

Compatibility Matrix

The tables below show mutually exclusive features and the support on some hardware.

The symbols used have the following meanings:

✅ = Full compatibility
🟠 = Partial compatibility
❌ = No compatibility
❔ = Unknown or TBD

!!! note Check the ❌ or 🟠 with links to see tracking issue for unsupported feature/hardware combination.

Feature x Feature

Feature	CP	APC	LoRA	SD	CUDA graph	pooling	<abbr title="Encoder-Decoder Models">enc-dec</abbr>	<abbr title="Logprobs">logP</abbr>	<abbr title="Prompt Logprobs">prmpt logP</abbr>	<abbr title="Async Output Processing">async output</abbr>	multi-step	<abbr title="Multimodal Inputs">mm</abbr>	best-of	beam-search	prompt-embeds
CP	✅
APC	✅	✅
LoRA	✅	✅	✅
SD	✅	✅	❌	✅
CUDA graph	✅	✅	✅	✅	✅
pooling	🟠*	🟠*	✅	❌	✅	✅
<abbr title="Encoder-Decoder Models">enc-dec</abbr>	❌	❌	❌	❌	✅	✅	✅
<abbr title="Logprobs">logP</abbr>	✅	✅	✅	✅	✅	❌	✅	✅
<abbr title="Prompt Logprobs">prmpt logP</abbr>	✅	✅	✅	✅	✅	❌	✅	✅	✅
<abbr title="Async Output Processing">async output</abbr>	✅	✅	✅	❌	✅	❌	❌	✅	✅	✅
multi-step	❌	✅	❌	❌	✅	❌	❌	✅	✅	✅	✅
mm	✅	✅	🟠<sup>^</sup>	❔	✅	✅	✅	✅	✅	✅	❔	✅
best-of	✅	✅	✅	❌	✅	❌	✅	✅	✅	❔	❌	✅	✅
beam-search	✅	✅	✅	❌	✅	❌	✅	✅	✅	❔	❌	❔	✅	✅
prompt-embeds	✅	✅	✅	❌	✅	❌	❌	✅	❌	❔	❔	❌	❔	❔	✅

* Chunked prefill and prefix caching are only applicable to last-token or all pooling with causal attention.
<sup>^</sup> LoRA is only applicable to the language backbone of multimodal models.

Feature x Hardware

Feature	Volta	Turing	Ampere	Ada	Hopper	CPU	AMD	Intel GPU
CP	❌	✅	✅	✅	✅	✅	✅	✅
APC	❌	✅	✅	✅	✅	✅	✅	✅
LoRA	✅	✅	✅	✅	✅	✅	✅	✅
SD	✅	✅	✅	✅	✅	❌	✅	✅
CUDA graph	✅	✅	✅	✅	✅	❌	✅	❌
pooling	✅	✅	✅	✅	✅	✅	✅	✅
<abbr title="Encoder-Decoder Models">enc-dec</abbr>	✅	✅	✅	✅	✅	✅	❌	✅
mm	✅	✅	✅	✅	✅	✅	✅	✅
prompt-embeds	✅	✅	✅	✅	✅	✅	❔	✅
<abbr title="Logprobs">logP</abbr>	✅	✅	✅	✅	✅	✅	✅	✅
<abbr title="Prompt Logprobs">prmpt logP</abbr>	✅	✅	✅	✅	✅	✅	✅	✅
<abbr title="Async Output Processing">async output</abbr>	✅	✅	✅	✅	✅	❌	❌	✅
multi-step	✅	✅	✅	✅	✅	❌	✅	✅
best-of	✅	✅	✅	✅	✅	✅	✅	✅
beam-search	✅	✅	✅	✅	✅	✅	✅	✅

!!! note For information on feature support on Google TPU, please refer to the TPU-Inference Recommended Models and Features documentation.