docs/QUANTS.md
Mistral.rs supports the following quantization:
gguf (cli) / GGUF (Python) model selectormistralrs run --format gguf -f my-gguf-file.gguf
See the docs
mistralrs run --isq 4 -m microsoft/Phi-3-mini-4k-instruct
mistralrs run -m kaitchup/Phi-3-mini-4k-instruct-gptq-4bit
You can create your own GPTQ model using [scripts/convert_to_gptq.py][../scripts/convert_to_gptq.py]:
pip install gptqmodel transformers datasets
python3 scripts/convert_to_gptq.py --src path/to/model --dst output/model/path --bits 4
mistralrs run -m mlx-community/Llama-3.8-1B-8bit