Back to Promptfoo

Transformers.js

site/docs/providers/transformers.md

0.121.95.5 KB
Original Source

Transformers.js

The Transformers.js provider enables fully local inference using Transformers.js v4, running ONNX-optimized models directly in Node.js without external APIs or GPU setup. v4 features a new WebGPU backend, broader model support (8B+ parameter models), and improved performance.

Installation

Transformers.js is an optional dependency (~200MB for ONNX runtime):

bash
npm install @huggingface/transformers

Quick Start

Embeddings

yaml
providers:
  - transformers:feature-extraction:Xenova/all-MiniLM-L6-v2

Popular models: Xenova/all-MiniLM-L6-v2 (384d), onnx-community/all-MiniLM-L6-v2-ONNX (384d), Xenova/bge-small-en-v1.5 (384d), nomic-ai/nomic-embed-text-v1.5 (768d)

Text Generation

yaml
providers:
  - transformers:text-generation:Xenova/gpt2

Popular models: Xenova/gpt2, onnx-community/Qwen3-0.6B-ONNX, onnx-community/Llama-3.2-1B-Instruct-ONNX

:::note Text generation runs on CPU and is best for testing. For production, consider Ollama or cloud APIs. :::

Configuration

Common Options

These options apply to both embedding and text generation providers:

OptionDescriptionDefault
device'auto', 'cpu', 'gpu', 'wasm', 'webgpu', 'cuda', 'dml', 'coreml', 'webnn''auto'
dtypeQuantization: 'fp32', 'fp16', 'q8', 'q4', 'q4f16''auto'
cacheDirOverride model cache directorySystem default
localFilesOnlySkip downloads, use cached models onlyfalse
revisionModel version/branch'main'

Embedding Options

yaml
providers:
  - id: transformers:feature-extraction:Xenova/bge-small-en-v1.5
    config:
      prefix: 'query: ' # Required for BGE, E5 models
      pooling: mean # 'mean', 'cls', 'first_token', 'eos', 'last_token', 'none'
      normalize: true # L2 normalize embeddings
      dtype: q8

Model prefixes: BGE and E5 models require prefix: 'query: ' for queries or prefix: 'passage: ' for documents. MiniLM models need no prefix.

:::tip transformers:embeddings:<model> is an alias for transformers:feature-extraction:<model>. :::

Text Generation Options

yaml
providers:
  - id: transformers:text-generation:onnx-community/Qwen3-0.6B-ONNX
    config:
      maxNewTokens: 256
      temperature: 0.7
      topK: 50
      topP: 0.9
      doSample: true
      repetitionPenalty: 1.1
      noRepeatNgramSize: 3
      numBeams: 1
      returnFullText: false
      dtype: q4

Using for Similarity Assertions

Use local embeddings as a grading provider for similar assertions:

yaml
defaultTest:
  options:
    provider:
      embedding:
        id: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2

providers:
  - openai:gpt-4o-mini

tests:
  - vars:
      question: 'What is photosynthesis?'
    assert:
      - type: similar
        value: 'Photosynthesis converts light to chemical energy in plants'
        threshold: 0.8

Or override per-assertion:

yaml
assert:
  - type: similar
    value: 'Expected output'
    threshold: 0.75
    provider: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2

Performance

  • Caching: Pipelines are cached after first load. Initial model download may take time, but subsequent runs are fast.
  • Quantization: Use dtype: q4 or dtype: q8 for faster inference and lower memory. Use dtype: q4f16 for WebGPU-optimized quantization.
  • WebGPU: v4 includes a new WebGPU runtime written in C++ with significantly improved performance. Use device: webgpu on supported systems.
  • Concurrency: For limited RAM, use promptfoo eval -j 1 to run serially.

Troubleshooting

ProblemSolution
Dependency not installedRun npm install @huggingface/transformers
Model not foundVerify model exists at HuggingFace with ONNX weights. Try Xenova or onnx-community models.
Out of memoryUse dtype: q4, run with -j 1, or try smaller models
Slow first runModels download on first use. Pre-download with await pipeline('feature-extraction', 'model-name')

Supported Models

Browse compatible models at huggingface.co/models?library=transformers.js.

Key organizations: onnx-community (optimized ONNX exports, recommended for v4), Xenova (legacy ONNX models, still compatible)