Log in Get started

V0.13.1

.changes/v0.13.1.md

4.3.1268 B

Original Source

v0.13.1 (2024-07-10)

Fixed and Improvements

Bump llama.cpp version to b3334, supporting Deepseek V2 series models.
Turn on fast attention for Qwen2-1.5B model to fix the quantization error.
Properly set number of GPU layers (to zero) when device is CPU.