Transformers.js

The Transformers.js provider enables fully local inference using Transformers.js v4, running ONNX-optimized models directly in Node.js without external APIs or GPU setup. v4 features a new WebGPU backend, broader model support (8B+ parameter models), and improved performance.

Installation

Transformers.js is an optional dependency (~200MB for ONNX runtime):

bash

npm install @huggingface/transformers

Quick Start

Embeddings

yaml

providers:
  - transformers:feature-extraction:Xenova/all-MiniLM-L6-v2

Popular models: Xenova/all-MiniLM-L6-v2 (384d), onnx-community/all-MiniLM-L6-v2-ONNX (384d), Xenova/bge-small-en-v1.5 (384d), nomic-ai/nomic-embed-text-v1.5 (768d)

Text Generation

yaml

providers:
  - transformers:text-generation:Xenova/gpt2

Popular models: Xenova/gpt2, onnx-community/Qwen3-0.6B-ONNX, onnx-community/Llama-3.2-1B-Instruct-ONNX

:::note Text generation runs on CPU and is best for testing. For production, consider Ollama or cloud APIs. :::

Configuration

Common Options

These options apply to both embedding and text generation providers:

Option	Description	Default
`device`	`'auto'`, `'cpu'`, `'gpu'`, `'wasm'`, `'webgpu'`, `'cuda'`, `'dml'`, `'coreml'`, `'webnn'`	`'auto'`
`dtype`	Quantization: `'fp32'`, `'fp16'`, `'q8'`, `'q4'`, `'q4f16'`	`'auto'`
`cacheDir`	Override model cache directory	System default
`localFilesOnly`	Skip downloads, use cached models only	`false`
`revision`	Model version/branch	`'main'`

Embedding Options

yaml

providers:
  - id: transformers:feature-extraction:Xenova/bge-small-en-v1.5
    config:
      prefix: 'query: ' # Required for BGE, E5 models
      pooling: mean # 'mean', 'cls', 'first_token', 'eos', 'last_token', 'none'
      normalize: true # L2 normalize embeddings
      dtype: q8

Model prefixes: BGE and E5 models require prefix: 'query: ' for queries or prefix: 'passage: ' for documents. MiniLM models need no prefix.

:::tip transformers:embeddings:<model> is an alias for transformers:feature-extraction:<model>. :::

Text Generation Options

yaml

providers:
  - id: transformers:text-generation:onnx-community/Qwen3-0.6B-ONNX
    config:
      maxNewTokens: 256
      temperature: 0.7
      topK: 50
      topP: 0.9
      doSample: true
      repetitionPenalty: 1.1
      noRepeatNgramSize: 3
      numBeams: 1
      returnFullText: false
      dtype: q4

Using for Similarity Assertions

Use local embeddings as a grading provider for similar assertions:

yaml

defaultTest:
  options:
    provider:
      embedding:
        id: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2

providers:
  - openai:gpt-4o-mini

tests:
  - vars:
      question: 'What is photosynthesis?'
    assert:
      - type: similar
        value: 'Photosynthesis converts light to chemical energy in plants'
        threshold: 0.8

Or override per-assertion:

yaml

assert:
  - type: similar
    value: 'Expected output'
    threshold: 0.75
    provider: transformers:feature-extraction:Xenova/all-MiniLM-L6-v2

Performance

Caching: Pipelines are cached after first load. Initial model download may take time, but subsequent runs are fast.
Quantization: Use dtype: q4 or dtype: q8 for faster inference and lower memory. Use dtype: q4f16 for WebGPU-optimized quantization.
WebGPU: v4 includes a new WebGPU runtime written in C++ with significantly improved performance. Use device: webgpu on supported systems.
Concurrency: For limited RAM, use promptfoo eval -j 1 to run serially.

Troubleshooting

Problem	Solution
Dependency not installed	Run `npm install @huggingface/transformers`
Model not found	Verify model exists at HuggingFace with ONNX weights. Try `Xenova` or `onnx-community` models.
Out of memory	Use `dtype: q4`, run with `-j 1`, or try smaller models
Slow first run	Models download on first use. Pre-download with `await pipeline('feature-extraction', 'model-name')`

Supported Models

Browse compatible models at huggingface.co/models?library=transformers.js.

Key organizations: onnx-community (optimized ONNX exports, recommended for v4), Xenova (legacy ONNX models, still compatible)