Back to Llmfit

Supported Models

MODELS.md

0.9.2218.6 KB
Original Source

Supported Models

llmfit ships with a curated database of 106 LLM models from HuggingFace. All memory estimates assume Q4_K_M quantization (0.5 bytes per parameter) unless noted otherwise.

01.ai

ModelParametersQuantizationContextUse Case
01-ai/Yi-6B-Chat6.1BQ4_K_M4kInstruction following, chat
01-ai/Yi-34B-Chat34.4BQ4_K_M4kInstruction following, chat

Alibaba

ModelParametersQuantizationContextUse Case
Qwen/Qwen3-0.6B600MQ4_K_M40kLightweight, edge deployment
Qwen/Qwen3.5-0.8B873MQ4_K_M256kMultimodal, vision and text
Qwen/Qwen3.5-0.8B-Base873MQ4_K_M256kMultimodal, vision and text
Qwen/Qwen2.5-Coder-1.5B-Instruct1.5BQ4_K_M32kCode generation and completion
Qwen/Qwen3-1.7B1.7BQ4_K_M40kLightweight, edge deployment
Qwen/Qwen3.5-2B2.3BQ4_K_M256kMultimodal, vision and text
Qwen/Qwen3.5-2B-Base2.3BQ4_K_M256kMultimodal, vision and text
Qwen/Qwen2.5-VL-3B-Instruct3.8BQ4_K_M32kMultimodal, vision and text
Qwen/Qwen3-4B4.0BQ4_K_M40kGeneral purpose text generation
Qwen/Qwen3.5-4B4.7BQ4_K_M256kMultimodal, vision and text
Qwen/Qwen3.5-4B-Base4.7BQ4_K_M256kMultimodal, vision and text
Qwen/Qwen2.5-7B-Instruct7.6BQ4_K_M32kInstruction following, chat
Qwen/Qwen2.5-Coder-7B-Instruct7.6BQ4_K_M32kCode generation and completion
Qwen/Qwen3-8B8.2BQ4_K_M40kGeneral purpose text generation
Qwen/Qwen2.5-VL-7B-Instruct8.3BQ4_K_M32kMultimodal, vision and text
Qwen/Qwen3.5-9B9.7BQ4_K_M256kMultimodal, vision and text
Qwen/Qwen3.5-9B-Base9.7BQ4_K_M256kMultimodal, vision and text
Qwen/Qwen2.5-14B-Instruct14.8BQ4_K_M128kInstruction following, chat
Qwen/Qwen3-14B14.8BQ4_K_M128kGeneral purpose text generation
Qwen/Qwen2.5-Coder-14B-Instruct14.8BQ4_K_M32kCode generation and completion
Qwen/Qwen3.5-27B27.8BQ4_K_M256kMultimodal, vision and text
Qwen/Qwen3-30B-A3B30.5B (MoE)Q4_K_M40kEfficient MoE, general purpose
Qwen/Qwen3.5-35B-A3B36.0B (MoE)Q4_K_M256kMultimodal, vision and text
Qwen/Qwen2.5-32B-Instruct32.5BQ4_K_M128kInstruction following, chat
Qwen/Qwen3-32B32.8BQ4_K_M40kGeneral purpose text generation
Qwen/Qwen2.5-Coder-32B-Instruct32.8BQ4_K_M32kCode generation and completion
Qwen/Qwen2.5-72B-Instruct72.7BQ4_K_M32kInstruction following, chat
Qwen/Qwen3.5-122B-A10B125.1B (MoE)Q4_K_M256kMultimodal, vision and text
Qwen/Qwen3-235B-A22B235B (MoE)Q4_K_M40kState-of-the-art, MoE architecture
Qwen/Qwen3.5-397B-A17B403.4B (MoE)Q4_K_M256kMultimodal, vision and text
Qwen/Qwen3-Coder-480B-A35B-Instruct480B (MoE)Q4_K_M256kCode generation and completion

Allen Institute

ModelParametersQuantizationContextUse Case
allenai/OLMo-2-0325-32B-Instruct32BQ4_K_M4kFully open-source, instruction following

Ant Group

ModelParametersQuantizationContextUse Case
inclusionAI/Ling-lite16.8B (MoE)Q4_K_M128kEfficient MoE, general purpose

BAAI

ModelParametersQuantizationContextUse Case
BAAI/bge-large-en-v1.5335MQ4_K_M512Text embeddings for RAG

Baidu

ModelParametersQuantizationContextUse Case
baidu/ERNIE-4.5-300B-A47B-Paddle300B (MoE)Q4_K_M128kMultilingual, reasoning

BigCode

ModelParametersQuantizationContextUse Case
bigcode/starcoder2-7b7.2BQ4_K_M16kCode generation and completion
bigcode/starcoder2-15b15.7BQ4_K_M16kCode generation and completion

BigScience

ModelParametersQuantizationContextUse Case
bigscience/bloom176BQ4_K_M2kMultilingual text generation

Cohere

ModelParametersQuantizationContextUse Case
CohereForAI/c4ai-command-r-v0135BQ4_K_M128kRAG, tool use, agents

Community

ModelParametersQuantizationContextUse Case
TinyLlama/TinyLlama-1.1B-Chat-v1.01.1BQ4_K_M2kInstruction following, chat

DeepSeek

ModelParametersQuantizationContextUse Case
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B7.6BQ4_K_M128kAdvanced reasoning, chain-of-thought
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct16B (MoE)Q4_K_M128kCode generation and completion
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B32.8BQ4_K_M128kAdvanced reasoning, chain-of-thought
deepseek-ai/DeepSeek-R1671B (MoE)Q4_K_M128kAdvanced reasoning, chain-of-thought
deepseek-ai/DeepSeek-V3685B (MoE)Q4_K_M128kState-of-the-art, MoE architecture

Google

ModelParametersQuantizationContextUse Case
google/gemma-3-1b-it1BQ4_K_M32kLightweight, edge deployment
google/gemma-2-2b-it2.6BQ4_K_M4kGeneral purpose text generation
google/gemma-3-4b-it4BQ4_K_M128kLightweight, general purpose
google/gemma-2-9b-it9.2BQ4_K_M4kGeneral purpose text generation
google/gemma-3-12b-it12BQ4_K_M128kMultimodal, vision and text
google/gemma-3-27b-it27BQ4_K_M128kGeneral purpose text generation
google/gemma-2-27b-it27.2BQ4_K_M4kGeneral purpose text generation

HuggingFace

ModelParametersQuantizationContextUse Case
HuggingFaceH4/zephyr-7b-beta7.2BQ4_K_M32kGeneral purpose text generation

IBM

ModelParametersQuantizationContextUse Case
ibm-granite/granite-4.0-h-micro3BQ4_K_M128kEnterprise, hybrid Mamba/transformer
ibm-granite/granite-4.0-h-tiny7B (MoE)Q4_K_M128kEnterprise, hybrid Mamba/transformer
ibm-granite/granite-3.1-8b-instruct8.1BQ4_K_M128kEnterprise, instruction following
ibm-granite/granite-4.0-h-small32B (MoE)Q4_K_M128kEnterprise, hybrid Mamba/transformer

LMSYS

ModelParametersQuantizationContextUse Case
lmsys/vicuna-7b-v1.57.0BQ4_K_M4kInstruction following, chat
lmsys/vicuna-13b-v1.513.0BQ4_K_M4kInstruction following, chat

Meituan

ModelParametersQuantizationContextUse Case
meituan/LongCat-Flash560B (MoE)Q4_K_M512kLong context MoE

Meta

ModelParametersQuantizationContextUse Case
meta-llama/Llama-3.2-1B1.2BQ4_K_M4kGeneral purpose text generation
meta-llama/Llama-3.2-3B3.2BQ4_K_M4kGeneral purpose text generation
meta-llama/CodeLlama-7b-Instruct-hf6.7BQ4_K_M4kCode generation and completion
meta-llama/Llama-3.1-8B8.0BQ4_K_M4kGeneral purpose text generation
meta-llama/Llama-3.1-8B-Instruct8.0BQ4_K_M4kInstruction following, chat
meta-llama/Llama-3.2-11B-Vision-Instruct10.7BQ4_K_M4kInstruction following, chat
meta-llama/CodeLlama-13b-Instruct-hf13.0BQ4_K_M4kCode generation and completion
meta-llama/CodeLlama-34b-Instruct-hf33.7BQ4_K_M4kCode generation and completion
meta-llama/Llama-3.1-70B-Instruct70.6BQ4_K_M4kInstruction following, chat
meta-llama/Llama-3.3-70B-Instruct70.6BQ4_K_M128kInstruction following, chat
meta-llama/Llama-4-Scout-17B-16E-Instruct109B (MoE)Q4_K_M10MMultimodal, vision and text
meta-llama/Llama-4-Maverick-17B-128E-Instruct400B (MoE)Q4_K_M1MMultimodal, vision and text
meta-llama/Llama-3.1-405B-Instruct405.9BQ4_K_M4kInstruction following, chat

Microsoft

ModelParametersQuantizationContextUse Case
microsoft/phi-3-mini-4k-instruct3.8BQ4_K_M4kLightweight, edge deployment
microsoft/Phi-3.5-mini-instruct3.8BQ4_K_M128kLightweight, long context
microsoft/Phi-4-mini-instruct3.8BQ4_K_M128kLightweight, edge deployment
microsoft/Orca-2-7b7.0BQ4_K_M4kReasoning, step-by-step solutions
microsoft/Orca-2-13b13.0BQ4_K_M4kReasoning, step-by-step solutions
microsoft/phi-414BQ4_K_M16kReasoning, STEM, code generation
microsoft/Phi-3-medium-14b-instruct14BQ4_K_M4kBalanced performance and size

Mistral AI

ModelParametersQuantizationContextUse Case
mistralai/Mistral-7B-Instruct-v0.37.2BQ4_K_M32kInstruction following, chat
mistralai/Ministral-8B-Instruct-24108.0BQ4_K_M32kInstruction following, chat
mistralai/Mistral-Nemo-Instruct-240712.2BQ4_K_M128kInstruction following, chat
mistralai/Mistral-Small-24B-Instruct-250124BQ4_K_M32kInstruction following, chat
mistralai/Mistral-Small-3.1-24B-Instruct-250324BQ4_K_M128kMultimodal, vision and text
mistralai/Mixtral-8x7B-Instruct-v0.146.7B (MoE)Q4_K_M32kInstruction following, chat
mistralai/Mistral-Large-Instruct-2407123BQ4_K_M128kLarge-scale instruction following
mistralai/Mixtral-8x22B-Instruct-v0.1140.6B (MoE)Q4_K_M64kLarge MoE, instruction following

Moonshot

ModelParametersQuantizationContextUse Case
moonshotai/Kimi-K2-Instruct1000B (MoE)Q4_K_M128kLarge MoE, reasoning

Nomic

ModelParametersQuantizationContextUse Case
nomic-ai/nomic-embed-text-v1.5137MF168kText embeddings for RAG

NousResearch

ModelParametersQuantizationContextUse Case
NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO46.7B (MoE)Q4_K_M32kGeneral purpose text generation

OpenChat

ModelParametersQuantizationContextUse Case
openchat/openchat-3.5-01067.0BQ4_K_M8kInstruction following, chat

Rednote

ModelParametersQuantizationContextUse Case
rednote-hilab/dots.llm1.inst142B (MoE)Q4_K_M128kMoE, general purpose

Stability AI

ModelParametersQuantizationContextUse Case
stabilityai/stablelm-2-1_6b-chat1.6BQ4_K_M4kInstruction following, chat

TII

ModelParametersQuantizationContextUse Case
tiiuae/falcon-7b-instruct7.2BQ4_K_M4kInstruction following, chat
tiiuae/Falcon3-7B-Instruct7.5BQ4_K_M32kInstruction following, chat
tiiuae/Falcon3-10B-Instruct10.3BQ4_K_M32kInstruction following, chat
tiiuae/falcon-40b-instruct40.0BQ4_K_M2kInstruction following, chat
tiiuae/falcon-180B-chat180BQ4_K_M2kLarge-scale instruction following

Upstage

ModelParametersQuantizationContextUse Case
upstage/SOLAR-10.7B-Instruct-v1.010.7BQ4_K_M4kHigh-performance instruction following

WizardLM

ModelParametersQuantizationContextUse Case
WizardLMTeam/WizardLM-13B-V1.213.0BQ4_K_M4kInstruction following, chat
WizardLMTeam/WizardCoder-15B-V1.015.5BQ4_K_M8kCode generation and completion

xAI

ModelParametersQuantizationContextUse Case
xai-org/grok-1314B (MoE)Q4_K_M8kLarge MoE, general purpose

Zhipu AI

ModelParametersQuantizationContextUse Case
THUDM/glm-4-9b-chat9BQ4_K_M128kMultilingual, instruction following