mistralrs-quant/README.md
mistralrs-quantAn advanced and highly diverse set of quantization techniques. This crate supports both quantization and optimized inference.
It has grown beyon simply quantization and is used by mistral.rs to power:
Currently supported:
GgufMatMul (2-8 bit quantization optimized for Metal and compatible with MLX)GgufMatMul (2-8 bit quantization, with imatrix)GptqAwqLayer (with CUDA marlin kernel)HqqLayer (4, 8 bit quantization)FP8LinearF8Q8LinearUnquantLinearBnbLinear (int8, fp4, nf4)Some kernels are copied or based on implementations in: