Back to Promptfoo

provider-transformers-local (Fully Local LLM Evaluation)

examples/provider-transformers-local/README.md

0.121.91.9 KB
Original Source

provider-transformers-local (Fully Local LLM Evaluation)

This example demonstrates a completely local LLM evaluation setup using Transformers.js - no API keys or external services required.

Prerequisites

Install the optional Transformers.js dependency:

bash
npm install @huggingface/transformers

Usage

bash
npx promptfoo@latest init --example provider-transformers-local
cd provider-transformers-local
npx promptfoo@latest eval

What This Example Shows

  • Local text generation with onnx-community/Qwen3-0.6B-ONNX (latest Qwen3 model with thinking capabilities)
  • Local embeddings with Xenova/all-MiniLM-L6-v2 for similarity assertions
  • Fully offline evaluation after initial model download
  • No API keys needed

Models Used

ModelTaskSizePurpose
onnx-community/Qwen3-0.6B-ONNXText Generation~600MBGenerate responses
Xenova/all-MiniLM-L6-v2Embeddings~23MBSimilarity assertions

First Run

The first evaluation downloads both models (cached for subsequent runs):

text
Downloading Qwen3-0.6B-ONNX... ~600MB
Downloading all-MiniLM-L6-v2... ~23MB

Subsequent runs use cached models and are much faster.

Configuration Highlights

yaml
providers:
  - id: transformers:text-generation:onnx-community/Qwen3-0.6B-ONNX
    config:
      maxNewTokens: 100
      temperature: 0.6
      topP: 0.95
      doSample: true

defaultTest:
  options:
    provider:
      embedding:
        id: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2

Notes

  • Runs entirely on CPU by default
  • For faster inference, use device: webgpu if your system supports it
  • Use dtype: q4 for smaller memory footprint with quantized models
  • Run with -j 1 for systems with limited RAM