Back to Mistral Rs

FlashAttention in mistral.rs

docs/FLASH_ATTENTION.md

0.8.0938 B
Original Source

FlashAttention in mistral.rs

Mistral.rs supports FlashAttention V2 and V3 on CUDA devices (V3 is only supported when CC >= 9.0).

Note: If compiled with FlashAttention and PagedAttention is enabled, then FlashAttention will be used in tandem to accelerate the prefill phase.

GPU Architecture Compatibility

ArchitectureCompute CapabilityExample GPUsFeature Flag
Ampere8.0, 8.6RTX 30*, A100, A40--features flash-attn
Ada Lovelace8.9RTX 40*, L40S--features flash-attn
Hopper9.0H100, H800--features flash-attn-v3
Blackwell10.0, 12.0RTX 50*--features flash-attn

Note: FlashAttention V2 and V3 are mutually exclusive Note: To use FlashAttention in the Python SDK, compile from source.