Node Module API Reference

This guide documents the Node.js module API for advanced programmatic usage of promptfoo. It covers all publicly exported functions, namespaces, and types.

:::info For standard YAML-based evaluation configuration, see Configuration Guide. This API reference is for users building programmatic evaluation workflows directly in JavaScript/TypeScript. :::

Overview

The promptfoo Node module provides programmatic access to:

Core evaluation engine (evaluate())
Provider management (load and resolve LLM providers)
Assertion execution (run assertions independently)
Cache management (control caching behavior)
Security guardrails (PII, harm, moderation checks)
Adversarial testing (red team evaluation)

Core API

`evaluate(testSuite, options?)`

The main entry point for running evaluations programmatically.

typescript

async function evaluate(testSuite: EvaluateTestSuite, options?: EvaluateOptions): Promise<Eval>;

Parameters:

testSuite: Configuration object containing prompts, providers, and tests
options: Optional evaluation settings (caching, output, concurrency, etc.)

Returns: Eval record. Call toEvaluateSummary() when you need the serializable results summary.

Example:

typescript

import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
  prompts: ['What is 2+2?'],
  providers: ['openai:chat:gpt-5.5', 'anthropic:messages:claude-opus-4-7'],
  tests: [
    {
      vars: { query: 'math question' },
      assert: [
        {
          type: 'contains',
          value: '4',
          metric: 'is_correct',
        },
      ],
    },
  ],
});
const results = await evalRecord.toEvaluateSummary();

console.log(`Passed: ${results.stats.successes}/${results.results.length}`);
console.log(`Shareable URL: ${evalRecord.shareableUrl}`);

Common Options:

typescript

interface EvaluateOptions {
  // Output and format
  outputPath?: string | string[];
  formatOutput?: boolean;

  // Evaluation behavior
  maxConcurrency?: number;
  nunjucksFilters?: Record<string, Function>;

  // Caching
  cache?: boolean;

  // Sharing and persistence
  sharing?: boolean;
  writeLatestResults?: boolean;

  // Progress tracking
  onTestComplete?: (result: EvaluateResult) => void;
}

Provider APIs

`loadApiProvider(providerPath, context?)`

Load a single provider instance by path or identifier.

typescript

async function loadApiProvider(
  providerPath: string,
  context?: LoadApiProviderContext,
): Promise<ApiProvider>;

Parameters:

providerPath: Provider identifier (e.g., 'openai:chat:gpt-5.5', 'anthropic:messages:claude-opus-4-7', or 'file://./custom-provider.js')
context: Optional context with environment overrides

Returns: Configured ApiProvider instance ready to call

Example:

typescript

import { loadApiProvider } from 'promptfoo';

const openaiProvider = await loadApiProvider('openai:chat:gpt-5.5', {
  env: { OPENAI_API_KEY: process.env.MY_SECRET_KEY },
});

const response = await openaiProvider.callApi('Hello, world!');

console.log(response.output);

Supported Provider Types:

openai:* (GPT models)
anthropic:* (Claude models)
azure:*
bedrock:* (AWS Bedrock)
vertex:* (Google Vertex AI)
cohere:*
replicate:*
huggingface:*
Custom files: file://./path/to/provider.js
Custom functions via inline JavaScript

`loadApiProviders(providers, options?)`

Load multiple providers in parallel. Accepts the same ProvidersConfig shape that evaluate() uses internally, so you can mix string identifiers with inline config objects or functions.

typescript

async function loadApiProviders(
  providers: ProvidersConfig,
  options?: { env?: Record<string, string> },
): Promise<ApiProvider[]>;

Parameters:

providers: Array of provider identifiers or config objects
options: Optional environment overrides

Returns: Array of configured provider instances

Example:

typescript

import { loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders([
  'openai:chat:gpt-5.5',
  'anthropic:messages:claude-opus-4-7',
  {
    id: 'custom-provider',
    config: {
      model: 'my-model',
      temperature: 0.7,
    },
  },
]);

for (const provider of providers) {
  const result = await provider.callApi('Test');
  console.log(`${provider.id()}: ${result.output}`);
}

Assertions API

`assertions.runAssertion(params)`

Execute a single assertion against provider output. Powerful for custom evaluation logic.

typescript

async function runAssertion({
  prompt?: string;
  provider?: ApiProvider;
  assertion: Assertion;
  test: AtomicTestCase;
  vars?: Record<string, VarValue>;
  latencyMs?: number;
  providerResponse: ProviderResponse;
  traceId?: string;
  traceData?: TraceData | null;
}): Promise<GradingResult>

Parameters:

assertion: The assertion to run (e.g., { type: 'contains', value: 'expected' })
test: The test case context
providerResponse: The output from the provider to evaluate
vars: Template variables from the test
provider: Provider instance (optional, for context)
traceData: Distributed trace data for debugging (optional)

Returns:

typescript

interface GradingResult {
  pass: boolean; // Did the assertion pass?
  score?: number; // 0-1 score
  reason?: string; // Explanation
  assertion: Assertion; // The original assertion
  metric?: string; // Metric name
  error?: string; // Error message if failed
}

Example: Custom Assertion Logic

typescript

import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
  assertion: {
    type: 'javascript',
    value: (output, context) => {
      // Custom grading logic
      const score = output.includes('yes') ? 1.0 : 0.0;
      return {
        pass: score >= 0.8,
        score,
        reason: `Output contains required keyword`,
      };
    },
  },
  test: {
    vars: { question: 'Is the sky blue?' },
    assert: [], // populated with assertion
  },
  providerResponse: {
    output: 'Yes, the sky is blue in most places.',
    tokenUsage: { total: 15 },
  },
});

console.log(`Pass: ${result.pass}, Score: ${result.score}`);

Assertion Value Function Context:

typescript

interface AssertionValueFunctionContext {
  // Test and evaluation metadata
  prompt: string | undefined;
  vars: Record<string, unknown>; // Template variables
  test: AtomicTestCase; // Full test context

  // Provider info
  provider: ApiProvider | undefined;
  providerResponse: ProviderResponse | undefined;

  // Output metadata
  logProbs?: number[]; // Token log probabilities

  // Advanced: Tracing
  trace?: TraceData; // Distributed trace with spans
  config?: Record<string, any>; // Custom assertion config
}

Using Trace Data:

typescript

import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
  assertion: {
    type: 'javascript',
    value: (output, context) => {
      // Access trace data for latency analysis
      if (context.trace?.spans) {
        const ttft = context.trace.spans.find((s) => s.name === 'time_to_first_token');
        console.log(`Time to first token: ${ttft?.duration}ms`);
      }
      return { pass: true };
    },
  },
  test: { vars: {} },
  providerResponse: { output: 'test' },
  traceId: 'trace-123',
  traceData: {
    spans: [
      {
        name: 'time_to_first_token',
        startTime: Date.now(),
        duration: 250,
      },
    ],
  },
});

`assertions.runAssertions(params)`

Execute multiple assertions in batch against provider output.

typescript

async function runAssertions({
  assertions: (Assertion | AssertionSet)[];
  prompt?: string;
  test: AtomicTestCase;
  provider?: ApiProvider;
  vars?: Record<string, VarValue>;
  providerResponse: ProviderResponse;
  latencyMs?: number;
  traceData?: TraceData | null;
}): Promise<AssertionsResult>

Returns:

typescript

interface GradingResult {
  pass: boolean;
  score: number; // Aggregate score across all assertions
  reason?: string;
  componentResults?: GradingResult[]; // Per-assertion results
  namedScores?: Record<string, number>;
  tokensUsed?: {
    total: number;
    prompt: number;
    completion: number;
    cached: number;
    numRequests: number;
  };
}

Example:

typescript

import { assertions } from 'promptfoo';

const result = await assertions.runAssertions({
  assertions: [
    { type: 'contains', value: '4' },
    { type: 'regex', value: '^The answer is \\d+$' },
    { type: 'not-regex', value: '(?i)error|failed' },
  ],
  test: { vars: { question: 'What is 2+2?' } },
  providerResponse: { output: 'The answer is 4.' },
});

console.log(`All passed: ${result.pass}`);
console.log(`Average score: ${result.score}`);
result.componentResults?.forEach((r) => console.log(`  ${r.assertion?.type}: ${r.pass}`));

Cache API

Control promptfoo's caching layer for LLM provider calls.

`enableCache()`

Enable caching for provider calls (default).

typescript

export function enableCache(): void;

Example:

typescript

import { cache, evaluate } from 'promptfoo';

cache.enableCache();

`disableCache()`

Disable caching. Useful during development/testing to always hit the provider.

typescript

export function disableCache(): void;

Example:

typescript

import { cache, evaluate } from 'promptfoo';

// Disable for fresh results
cache.disableCache();
const evalRecord = await evaluate(testSuite);
cache.enableCache();

`isCacheEnabled()`

Check if caching is currently enabled.

typescript

export function isCacheEnabled(): boolean;

`clearCache()`

Clear all cached results.

typescript

export async function clearCache(): Promise<void>;

Example:

typescript

import { cache, evaluate } from 'promptfoo';

// Clear old cache
await cache.clearCache();
await evaluate(testSuite); // Will refetch all provider calls

`withCacheNamespace(namespace, fn)`

Run evaluation with isolated cache namespace.

typescript

export function withCacheNamespace<T>(
  namespace: string | undefined,
  fn: () => Promise<T>,
): Promise<T>;

Use Cases:

Isolate caches for different test runs
Prevent cache collisions across environments
Version-specific caching

Example:

typescript

import { cache, evaluate } from 'promptfoo';

// Run v1 and v2 evals with separate caches
const v1Results = await cache.withCacheNamespace('v1', async () => {
  return evaluate(testSuiteV1);
});

const v2Results = await cache.withCacheNamespace('v2', async () => {
  return evaluate(testSuiteV2);
});

`getCache()`

Get the underlying cache instance for advanced operations.

typescript

export function getCache(): Cache;

interface Cache {
  get(key: string): Promise<unknown>;
  set(key: string, value: unknown, ttl?: number): Promise<void>;
  del(key: string): Promise<void>;
  clear(): Promise<void>;
}

`fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)`

Fetch with caching enabled.

typescript

export async function fetchWithCache<T>(
  url: string,
  options?: RequestInit,
  timeout?: number,
  format?: 'json' | 'text',
  bust?: boolean,
  maxRetries?: number,
): Promise<FetchWithCacheResult<T>>;

Example:

typescript

import { cache } from 'promptfoo';

const result = await cache.fetchWithCache(
  'https://api.example.com/data',
  { method: 'GET' },
  undefined,
  'json',
);

console.log(result.cached); // true if from cache
console.log(result.data); // the fetched data

Configuration: Set cache location and TTL via environment variables:

bash

# Cache directory (default: ~/.promptfoo/cache)
export PROMPTFOO_CACHE_PATH=/path/to/cache

# Cache TTL in seconds (default: 1209600 = 14 days)
export PROMPTFOO_CACHE_TTL=1209600

# Disable cache via env
export PROMPTFOO_CACHE_ENABLED=false

Guardrails API

Content safety and security layer. Use guardrails to detect PII, harmful content, and other safety concerns.

`guardrails.guard(input)`

Run general content moderation.

typescript

async function guard(input: string): Promise<GuardResult>;

Returns:

typescript

interface GuardResult {
  model: string;
  results: Array<{
    categories: Record<string, boolean>; // e.g., { hate: false, violence: true }
    category_scores: Record<string, number>; // Scores 0-1
    flagged: boolean; // Any category flagged?
  }>;
}

Example:

typescript

import { guardrails } from 'promptfoo';

const result = await guardrails.guard('This is a test message');

if (result.results[0].flagged) {
  console.log('Content flagged for safety issues:');
  Object.entries(result.results[0].categories).forEach(([name, flagged]) => {
    if (flagged) {
      console.log(`  ${name}: ${result.results[0].category_scores[name]}`);
    }
  });
}

`guardrails.pii(input)`

Detect personally identifiable information (PII).

typescript

async function pii(input: string): Promise<GuardResult>;

Returns: GuardResult with PII detection details

Example:

typescript

import { guardrails } from 'promptfoo';

const result = await guardrails.pii('John Doe, [email protected], SSN: 123-45-6789');

if (result.results[0].flagged) {
  console.log('PII detected:');
  if (result.results[0].payload?.pii) {
    result.results[0].payload.pii.forEach((item) => {
      console.log(`  ${item.type}: ${item.value}`);
    });
  }
}

`guardrails.harm(input)`

Detect harmful content (violence, hate speech, etc.).

typescript

async function harm(input: string): Promise<GuardResult>;

`guardrails.adaptive(request)`

Run adaptive guardrails with custom configuration.

typescript

async function adaptive(request: AdaptiveRequest): Promise<AdaptiveResult>;

Red Team API

Adversarial testing framework for finding vulnerabilities in LLM systems.

`redteam.generate(config)`

Generate adversarial test cases.

typescript

async function generate(config: RedteamGenerateConfig): Promise<RedteamGenerateResult>;

See Red Teaming Guide for full documentation.

`redteam.run(config)`

Run full red team evaluation against a system.

typescript

async function run(config: RedteamRunConfig): Promise<RedteamRunResult>;

`redteam.Plugins`

Registry of available red team attack plugins.

typescript

interface Plugins {
  [pluginName: string]: typeof RedteamPluginBase;
}

// Example plugins
Plugins['prompt-injection']; // Injection attacks
Plugins['jailbreak']; // Jailbreak attempts
Plugins['rbac']; // RBAC bypasses
Plugins['sql-injection']; // SQL injection
// ... and many more

Creating Custom Plugins:

typescript

import { redteam } from 'promptfoo';

class MyCustomPlugin extends redteam.Base.Plugin {
  async run(params: RedteamPluginRunParams): Promise<RedteamPluginResult> {
    // Custom attack logic
    return {
      generated: ['attack1', 'attack2'],
      stats: { duration: 100 },
    };
  }
}

// Register and use
export default MyCustomPlugin;

`redteam.Strategies`

Registry of red team strategies for composing attacks.

typescript

interface Strategies {
  [strategyName: string]: RedteamStrategy;
}

See Red Teaming Strategies for details.

`redteam.Extractors`

Extract system information for targeted attacks.

typescript

const {
  extractEntities, // Extract named entities from system
  extractMcpToolsInfo, // Extract MCP tool metadata
  extractSystemPurpose, // Extract system purpose and goal
} = redteam.Extractors;

Example:

typescript

import { redteam } from 'promptfoo';

const purpose = await redteam.Extractors.extractSystemPurpose({
  prompt: 'You are a helpful assistant...',
});

console.log(`System purpose: ${purpose}`);

`redteam.Base.Plugin`

Base class for custom red team plugins.

typescript

abstract class RedteamPluginBase {
  async run(params: {
    target: Prompt;
    injectVar?: string;
    conversationHistory?: Message[];
    options?: Record<string, unknown>;
  }): Promise<RedteamPluginResult>;
}

See Plugin Development Guide (below) for examples.

`redteam.Base.Grader`

Base class for custom red team graders.

typescript

abstract class RedteamGraderBase {
  async grade(params: { prompt: string; output: string; rubric: string }): Promise<GradingResult>;
}

Utilities

`generateTable(evaluateTable, tableCellMaxLength?, maxRows?)`

Format evaluation results as a display table.

typescript

export function generateTable(
  evaluateTable: EvaluateTable,
  tableCellMaxLength?: number,
  maxRows?: number,
): string;

Example:

typescript

import { evaluate, generateTable } from 'promptfoo';

const evalRecord = await evaluate(testSuite);
const table = await evalRecord.getTable();
console.log(generateTable(table));

`isTransformFunction(value)`

Check if a value is a transform function (for custom prompt/variable transforms).

typescript

export function isTransformFunction(value: unknown): value is TransformFunction;

Example:

typescript

import { isTransformFunction } from 'promptfoo';

const maybeTransform = (x) => x.toUpperCase();
console.log(isTransformFunction(maybeTransform)); // true

Advanced Patterns

Pattern 1: Programmatic Test Suite Generation

Dynamically generate test cases:

typescript

import { evaluate } from 'promptfoo';

const generateTests = (questions) => {
  return questions.map((q) => ({
    vars: { question: q },
    assert: [
      { type: 'contains', value: 'verified-fact' },
      {
        type: 'custom-llm-rubric',
        value: 'Is the answer accurate?',
      },
    ],
  }));
};

const evalRecord = await evaluate({
  prompts: ['Answer: {{ question }}'],
  providers: ['openai:chat:gpt-5.5'],
  tests: generateTests([
    'What is the capital of France?',
    'What is 2+2?',
    'What is photosynthesis?',
  ]),
});

Pattern 2: Custom Grading with Context

Use provider response data in assertions:

typescript

import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
  prompts: ['Analyze: {{ text }}'],
  providers: ['openai:chat:gpt-5.5'],
  tests: [
    {
      vars: { text: 'Sample text' },
      assert: [
        {
          type: 'javascript',
          value: (output, context) => {
            // Access token usage from provider response
            const tokens = context.providerResponse?.tokenUsage?.total || 0;
            return {
              pass: tokens < 100,
              reason: `Used ${tokens} tokens`,
            };
          },
        },
      ],
    },
  ],
});

Pattern 3: Batch Provider Testing

Load providers and test in parallel:

typescript

import { assertions, loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders([
  'openai:chat:gpt-5.5',
  'anthropic:messages:claude-opus-4-7',
]);

const testCases = ['2+2=?', 'What is AI?'];

for (const provider of providers) {
  console.log(`Testing ${provider.id()}:`);
  for (const test of testCases) {
    const response = await provider.callApi(test);

    const result = await assertions.runAssertion({
      provider,
      assertion: { type: 'contains', value: 'answer' },
      test: { vars: { prompt: test } },
      providerResponse: response,
    });

    console.log(`  ${test}: ${result.pass ? '✓' : '✗'}`);
  }
}

Pattern 4: Cache Isolation for A/B Testing

Compare two model versions with isolated caches:

typescript

import { cache, evaluate } from 'promptfoo';

async function compareModels(oldVersion, newVersion, testSuite) {
  const oldEval = await cache.withCacheNamespace(`model-${oldVersion}`, () =>
    evaluate({
      ...testSuite,
      providers: [`openai:chat:${oldVersion}`],
    }),
  );

  const newEval = await cache.withCacheNamespace(`model-${newVersion}`, () =>
    evaluate({
      ...testSuite,
      providers: [`openai:chat:${newVersion}`],
    }),
  );

  const oldResults = await oldEval.toEvaluateSummary();
  const newResults = await newEval.toEvaluateSummary();

  return {
    oldPass: oldResults.stats.successes,
    newPass: newResults.stats.successes,
    improvement: newResults.stats.successes - oldResults.stats.successes,
  };
}

Type Definitions

All types are exported from promptfoo and available in TypeScript:

typescript

import type {
  // Main types
  EvaluateTestSuite,
  EvaluateOptions,
  EvaluateSummary,
  EvaluateResult,

  // Test configuration
  TestCase,
  AtomicTestCase,
  Assertion,
  AssertionSet,

  // Provider types
  ApiProvider,
  ProviderResponse,
  ProviderConfig,

  // Evaluation types
  GradingResult,
  AssertionValueFunctionContext,

  // Transform types
  TransformFunction,
  TransformContext,

  // Extension hooks
  BeforeAllExtensionHookContext,
  AfterEachExtensionHookContext,
  // ... and more
} from 'promptfoo';

API Stability

Stable APIs (Safe for production)

evaluate()
loadApiProvider(), loadApiProviders()
assertions.runAssertion(), assertions.runAssertions()
Cache namespace functions
Guardrails namespace

Experimental APIs (May change)

redteam.* (plugin/strategy system)
Trace data structures

Internal APIs (Don't use directly)

Functions not exported from main promptfoo module
Test utilities
CLI-specific functions

Troubleshooting

Cache issues?

typescript

// Clear cache if stale
import { cache } from 'promptfoo';
await cache.clearCache();

// Or disable for development
cache.disableCache();

Type errors with custom assertions?

typescript

import type { AssertionValueFunctionContext } from 'promptfoo';

// Type-safe context usage
value: (output, context: AssertionValueFunctionContext) => {
  // Full autocomplete and type checking
};

Need to debug evaluation flow?

typescript

// Enable verbose logging
process.env.LOG_LEVEL = 'debug';

// Or check trace data in assertions
const result = await assertions.runAssertion({
  // ...
  traceData: trace, // Access timing information
});

Examples Repository

Find runnable examples in the promptfoo examples directory.

Node API reference

Node Module API Reference

Overview

Core API

evaluate(testSuite, options?)

Provider APIs

loadApiProvider(providerPath, context?)

loadApiProviders(providers, options?)

Assertions API

assertions.runAssertion(params)

assertions.runAssertions(params)

Cache API

enableCache()

disableCache()

isCacheEnabled()

clearCache()

withCacheNamespace(namespace, fn)

getCache()

fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)

Guardrails API

guardrails.guard(input)

guardrails.pii(input)

guardrails.harm(input)

guardrails.adaptive(request)

Red Team API

redteam.generate(config)

redteam.run(config)

redteam.Plugins

redteam.Strategies

redteam.Extractors

redteam.Base.Plugin

redteam.Base.Grader

Utilities

generateTable(evaluateTable, tableCellMaxLength?, maxRows?)

isTransformFunction(value)

Advanced Patterns

Pattern 1: Programmatic Test Suite Generation

Pattern 2: Custom Grading with Context

Pattern 3: Batch Provider Testing

Pattern 4: Cache Isolation for A/B Testing

Type Definitions

API Stability

Stable APIs (Safe for production)

Experimental APIs (May change)

Internal APIs (Don't use directly)

Troubleshooting

Examples Repository

See Also

`evaluate(testSuite, options?)`

`loadApiProvider(providerPath, context?)`

`loadApiProviders(providers, options?)`

`assertions.runAssertion(params)`

`assertions.runAssertions(params)`

`enableCache()`

`disableCache()`

`isCacheEnabled()`

`clearCache()`

`withCacheNamespace(namespace, fn)`

`getCache()`

`fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)`

`guardrails.guard(input)`

`guardrails.pii(input)`

`guardrails.harm(input)`

`guardrails.adaptive(request)`

`redteam.generate(config)`

`redteam.run(config)`

`redteam.Plugins`

`redteam.Strategies`

`redteam.Extractors`

`redteam.Base.Plugin`

`redteam.Base.Grader`

`generateTable(evaluateTable, tableCellMaxLength?, maxRows?)`

`isTransformFunction(value)`