examples/provider-transformers-local/README.md
This example demonstrates a completely local LLM evaluation setup using Transformers.js - no API keys or external services required.
Install the optional Transformers.js dependency:
npm install @huggingface/transformers
npx promptfoo@latest init --example provider-transformers-local
cd provider-transformers-local
npx promptfoo@latest eval
onnx-community/Qwen3-0.6B-ONNX (latest Qwen3 model with thinking capabilities)Xenova/all-MiniLM-L6-v2 for similarity assertions| Model | Task | Size | Purpose |
|---|---|---|---|
onnx-community/Qwen3-0.6B-ONNX | Text Generation | ~600MB | Generate responses |
Xenova/all-MiniLM-L6-v2 | Embeddings | ~23MB | Similarity assertions |
The first evaluation downloads both models (cached for subsequent runs):
Downloading Qwen3-0.6B-ONNX... ~600MB
Downloading all-MiniLM-L6-v2... ~23MB
Subsequent runs use cached models and are much faster.
providers:
- id: transformers:text-generation:onnx-community/Qwen3-0.6B-ONNX
config:
maxNewTokens: 100
temperature: 0.6
topP: 0.95
doSample: true
defaultTest:
options:
provider:
embedding:
id: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2
device: webgpu if your system supports itdtype: q4 for smaller memory footprint with quantized models-j 1 for systems with limited RAM