Back to Powerinfer

quantize

examples/quantize/README.md

latest565 B
Original Source

quantize

TODO

Llama 2 7B

QuantizationBits per Weight (BPW)
Q2_K3.35
Q3_K_S3.50
Q3_K_M3.91
Q3_K_L4.27
Q4_K_S4.58
Q4_K_M4.84
Q5_K_S5.52
Q5_K_M5.68
Q6_K6.56

Llama 2 13B

QuantizationBits per Weight (BPW)
Q2_K3.34
Q3_K_S3.48
Q3_K_M3.89
Q3_K_L4.26
Q4_K_S4.56
Q4_K_M4.83
Q5_K_S5.51
Q5_K_M5.67
Q6_K6.56

Llama 2 70B

QuantizationBits per Weight (BPW)
Q2_K3.40
Q3_K_S3.47
Q3_K_M3.85
Q3_K_L4.19
Q4_K_S4.53
Q4_K_M4.80
Q5_K_S5.50
Q5_K_M5.65
Q6_K6.56