Back to Promptfoo

Node API reference

site/docs/usage/node-api-reference.md

0.121.1022.4 KB
Original Source

Node Module API Reference

This guide documents the Node.js module API for advanced programmatic usage of promptfoo. It covers all publicly exported functions, namespaces, and types.

:::info For standard YAML-based evaluation configuration, see Configuration Guide. This API reference is for users building programmatic evaluation workflows directly in JavaScript/TypeScript. :::

Overview

The promptfoo Node module provides programmatic access to:

  • Core evaluation engine (evaluate())
  • Provider management (load and resolve LLM providers)
  • Assertion execution (run assertions independently)
  • Cache management (control caching behavior)
  • Security guardrails (PII, harm, moderation checks)
  • Adversarial testing (red team evaluation)

Core API

evaluate(testSuite, options?)

The main entry point for running evaluations programmatically.

typescript
async function evaluate(testSuite: EvaluateTestSuite, options?: EvaluateOptions): Promise<Eval>;

Parameters:

  • testSuite: Configuration object containing prompts, providers, and tests
  • options: Optional evaluation settings (caching, output, concurrency, etc.)

Returns: Eval record. Call toEvaluateSummary() when you need the serializable results summary.

Example:

typescript
import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
  prompts: ['What is 2+2?'],
  providers: ['openai:chat:gpt-5.5', 'anthropic:messages:claude-opus-4-7'],
  tests: [
    {
      vars: { query: 'math question' },
      assert: [
        {
          type: 'contains',
          value: '4',
          metric: 'is_correct',
        },
      ],
    },
  ],
});
const results = await evalRecord.toEvaluateSummary();

console.log(`Passed: ${results.stats.successes}/${results.results.length}`);
console.log(`Shareable URL: ${evalRecord.shareableUrl}`);

Common Options:

typescript
interface EvaluateOptions {
  // Output and format
  outputPath?: string | string[];
  formatOutput?: boolean;

  // Evaluation behavior
  maxConcurrency?: number;
  nunjucksFilters?: Record<string, Function>;

  // Caching
  cache?: boolean;

  // Sharing and persistence
  sharing?: boolean;
  writeLatestResults?: boolean;

  // Progress tracking
  onTestComplete?: (result: EvaluateResult) => void;
}

Provider APIs

loadApiProvider(providerPath, context?)

Load a single provider instance by path or identifier.

typescript
async function loadApiProvider(
  providerPath: string,
  context?: LoadApiProviderContext,
): Promise<ApiProvider>;

Parameters:

  • providerPath: Provider identifier (e.g., 'openai:chat:gpt-5.5', 'anthropic:messages:claude-opus-4-7', or 'file://./custom-provider.js')
  • context: Optional context with environment overrides

Returns: Configured ApiProvider instance ready to call

Example:

typescript
import { loadApiProvider } from 'promptfoo';

const openaiProvider = await loadApiProvider('openai:chat:gpt-5.5', {
  env: { OPENAI_API_KEY: process.env.MY_SECRET_KEY },
});

const response = await openaiProvider.callApi('Hello, world!');

console.log(response.output);

Supported Provider Types:

  • openai:* (GPT models)
  • anthropic:* (Claude models)
  • azure:*
  • bedrock:* (AWS Bedrock)
  • vertex:* (Google Vertex AI)
  • cohere:*
  • replicate:*
  • huggingface:*
  • Custom files: file://./path/to/provider.js
  • Custom functions via inline JavaScript

loadApiProviders(providers, options?)

Load multiple providers in parallel. Accepts the same ProvidersConfig shape that evaluate() uses internally, so you can mix string identifiers with inline config objects or functions.

typescript
async function loadApiProviders(
  providers: ProvidersConfig,
  options?: { env?: Record<string, string> },
): Promise<ApiProvider[]>;

Parameters:

  • providers: Array of provider identifiers or config objects
  • options: Optional environment overrides

Returns: Array of configured provider instances

Example:

typescript
import { loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders([
  'openai:chat:gpt-5.5',
  'anthropic:messages:claude-opus-4-7',
  {
    id: 'custom-provider',
    config: {
      model: 'my-model',
      temperature: 0.7,
    },
  },
]);

for (const provider of providers) {
  const result = await provider.callApi('Test');
  console.log(`${provider.id()}: ${result.output}`);
}

Assertions API

assertions.runAssertion(params)

Execute a single assertion against provider output. Powerful for custom evaluation logic.

typescript
async function runAssertion({
  prompt?: string;
  provider?: ApiProvider;
  assertion: Assertion;
  test: AtomicTestCase;
  vars?: Record<string, VarValue>;
  latencyMs?: number;
  providerResponse: ProviderResponse;
  traceId?: string;
  traceData?: TraceData | null;
}): Promise<GradingResult>

Parameters:

  • assertion: The assertion to run (e.g., { type: 'contains', value: 'expected' })
  • test: The test case context
  • providerResponse: The output from the provider to evaluate
  • vars: Template variables from the test
  • provider: Provider instance (optional, for context)
  • traceData: Distributed trace data for debugging (optional)

Returns:

typescript
interface GradingResult {
  pass: boolean; // Did the assertion pass?
  score?: number; // 0-1 score
  reason?: string; // Explanation
  assertion: Assertion; // The original assertion
  metric?: string; // Metric name
  error?: string; // Error message if failed
}

Example: Custom Assertion Logic

typescript
import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
  assertion: {
    type: 'javascript',
    value: (output, context) => {
      // Custom grading logic
      const score = output.includes('yes') ? 1.0 : 0.0;
      return {
        pass: score >= 0.8,
        score,
        reason: `Output contains required keyword`,
      };
    },
  },
  test: {
    vars: { question: 'Is the sky blue?' },
    assert: [], // populated with assertion
  },
  providerResponse: {
    output: 'Yes, the sky is blue in most places.',
    tokenUsage: { total: 15 },
  },
});

console.log(`Pass: ${result.pass}, Score: ${result.score}`);

Assertion Value Function Context:

typescript
interface AssertionValueFunctionContext {
  // Test and evaluation metadata
  prompt: string | undefined;
  vars: Record<string, unknown>; // Template variables
  test: AtomicTestCase; // Full test context

  // Provider info
  provider: ApiProvider | undefined;
  providerResponse: ProviderResponse | undefined;

  // Output metadata
  logProbs?: number[]; // Token log probabilities

  // Advanced: Tracing
  trace?: TraceData; // Distributed trace with spans
  config?: Record<string, any>; // Custom assertion config
}

Using Trace Data:

typescript
import { assertions } from 'promptfoo';

const result = await assertions.runAssertion({
  assertion: {
    type: 'javascript',
    value: (output, context) => {
      // Access trace data for latency analysis
      if (context.trace?.spans) {
        const ttft = context.trace.spans.find((s) => s.name === 'time_to_first_token');
        console.log(`Time to first token: ${ttft?.duration}ms`);
      }
      return { pass: true };
    },
  },
  test: { vars: {} },
  providerResponse: { output: 'test' },
  traceId: 'trace-123',
  traceData: {
    spans: [
      {
        name: 'time_to_first_token',
        startTime: Date.now(),
        duration: 250,
      },
    ],
  },
});

assertions.runAssertions(params)

Execute multiple assertions in batch against provider output.

typescript
async function runAssertions({
  assertions: (Assertion | AssertionSet)[];
  prompt?: string;
  test: AtomicTestCase;
  provider?: ApiProvider;
  vars?: Record<string, VarValue>;
  providerResponse: ProviderResponse;
  latencyMs?: number;
  traceData?: TraceData | null;
}): Promise<AssertionsResult>

Returns:

typescript
interface GradingResult {
  pass: boolean;
  score: number; // Aggregate score across all assertions
  reason?: string;
  componentResults?: GradingResult[]; // Per-assertion results
  namedScores?: Record<string, number>;
  tokensUsed?: {
    total: number;
    prompt: number;
    completion: number;
    cached: number;
    numRequests: number;
  };
}

Example:

typescript
import { assertions } from 'promptfoo';

const result = await assertions.runAssertions({
  assertions: [
    { type: 'contains', value: '4' },
    { type: 'regex', value: '^The answer is \\d+$' },
    { type: 'not-regex', value: '(?i)error|failed' },
  ],
  test: { vars: { question: 'What is 2+2?' } },
  providerResponse: { output: 'The answer is 4.' },
});

console.log(`All passed: ${result.pass}`);
console.log(`Average score: ${result.score}`);
result.componentResults?.forEach((r) => console.log(`  ${r.assertion?.type}: ${r.pass}`));

Cache API

Control promptfoo's caching layer for LLM provider calls.

enableCache()

Enable caching for provider calls (default).

typescript
export function enableCache(): void;

Example:

typescript
import { cache, evaluate } from 'promptfoo';

cache.enableCache();

disableCache()

Disable caching. Useful during development/testing to always hit the provider.

typescript
export function disableCache(): void;

Example:

typescript
import { cache, evaluate } from 'promptfoo';

// Disable for fresh results
cache.disableCache();
const evalRecord = await evaluate(testSuite);
cache.enableCache();

isCacheEnabled()

Check if caching is currently enabled.

typescript
export function isCacheEnabled(): boolean;

clearCache()

Clear all cached results.

typescript
export async function clearCache(): Promise<void>;

Example:

typescript
import { cache, evaluate } from 'promptfoo';

// Clear old cache
await cache.clearCache();
await evaluate(testSuite); // Will refetch all provider calls

withCacheNamespace(namespace, fn)

Run evaluation with isolated cache namespace.

typescript
export function withCacheNamespace<T>(
  namespace: string | undefined,
  fn: () => Promise<T>,
): Promise<T>;

Use Cases:

  • Isolate caches for different test runs
  • Prevent cache collisions across environments
  • Version-specific caching

Example:

typescript
import { cache, evaluate } from 'promptfoo';

// Run v1 and v2 evals with separate caches
const v1Results = await cache.withCacheNamespace('v1', async () => {
  return evaluate(testSuiteV1);
});

const v2Results = await cache.withCacheNamespace('v2', async () => {
  return evaluate(testSuiteV2);
});

getCache()

Get the underlying cache instance for advanced operations.

typescript
export function getCache(): Cache;

interface Cache {
  get(key: string): Promise<unknown>;
  set(key: string, value: unknown, ttl?: number): Promise<void>;
  del(key: string): Promise<void>;
  clear(): Promise<void>;
}

fetchWithCache(url, options?, timeout?, format?, bust?, maxRetries?)

Fetch with caching enabled.

typescript
export async function fetchWithCache<T>(
  url: string,
  options?: RequestInit,
  timeout?: number,
  format?: 'json' | 'text',
  bust?: boolean,
  maxRetries?: number,
): Promise<FetchWithCacheResult<T>>;

Example:

typescript
import { cache } from 'promptfoo';

const result = await cache.fetchWithCache(
  'https://api.example.com/data',
  { method: 'GET' },
  undefined,
  'json',
);

console.log(result.cached); // true if from cache
console.log(result.data); // the fetched data

Configuration: Set cache location and TTL via environment variables:

bash
# Cache directory (default: ~/.promptfoo/cache)
export PROMPTFOO_CACHE_PATH=/path/to/cache

# Cache TTL in seconds (default: 1209600 = 14 days)
export PROMPTFOO_CACHE_TTL=1209600

# Disable cache via env
export PROMPTFOO_CACHE_ENABLED=false

Guardrails API

Content safety and security layer. Use guardrails to detect PII, harmful content, and other safety concerns.

guardrails.guard(input)

Run general content moderation.

typescript
async function guard(input: string): Promise<GuardResult>;

Returns:

typescript
interface GuardResult {
  model: string;
  results: Array<{
    categories: Record<string, boolean>; // e.g., { hate: false, violence: true }
    category_scores: Record<string, number>; // Scores 0-1
    flagged: boolean; // Any category flagged?
  }>;
}

Example:

typescript
import { guardrails } from 'promptfoo';

const result = await guardrails.guard('This is a test message');

if (result.results[0].flagged) {
  console.log('Content flagged for safety issues:');
  Object.entries(result.results[0].categories).forEach(([name, flagged]) => {
    if (flagged) {
      console.log(`  ${name}: ${result.results[0].category_scores[name]}`);
    }
  });
}

guardrails.pii(input)

Detect personally identifiable information (PII).

typescript
async function pii(input: string): Promise<GuardResult>;

Returns: GuardResult with PII detection details

Example:

typescript
import { guardrails } from 'promptfoo';

const result = await guardrails.pii('John Doe, [email protected], SSN: 123-45-6789');

if (result.results[0].flagged) {
  console.log('PII detected:');
  if (result.results[0].payload?.pii) {
    result.results[0].payload.pii.forEach((item) => {
      console.log(`  ${item.type}: ${item.value}`);
    });
  }
}

guardrails.harm(input)

Detect harmful content (violence, hate speech, etc.).

typescript
async function harm(input: string): Promise<GuardResult>;

guardrails.adaptive(request)

Run adaptive guardrails with custom configuration.

typescript
async function adaptive(request: AdaptiveRequest): Promise<AdaptiveResult>;

Red Team API

Adversarial testing framework for finding vulnerabilities in LLM systems.

redteam.generate(config)

Generate adversarial test cases.

typescript
async function generate(config: RedteamGenerateConfig): Promise<RedteamGenerateResult>;

See Red Teaming Guide for full documentation.


redteam.run(config)

Run full red team evaluation against a system.

typescript
async function run(config: RedteamRunConfig): Promise<RedteamRunResult>;

redteam.Plugins

Registry of available red team attack plugins.

typescript
interface Plugins {
  [pluginName: string]: typeof RedteamPluginBase;
}

// Example plugins
Plugins['prompt-injection']; // Injection attacks
Plugins['jailbreak']; // Jailbreak attempts
Plugins['rbac']; // RBAC bypasses
Plugins['sql-injection']; // SQL injection
// ... and many more

Creating Custom Plugins:

typescript
import { redteam } from 'promptfoo';

class MyCustomPlugin extends redteam.Base.Plugin {
  async run(params: RedteamPluginRunParams): Promise<RedteamPluginResult> {
    // Custom attack logic
    return {
      generated: ['attack1', 'attack2'],
      stats: { duration: 100 },
    };
  }
}

// Register and use
export default MyCustomPlugin;

redteam.Strategies

Registry of red team strategies for composing attacks.

typescript
interface Strategies {
  [strategyName: string]: RedteamStrategy;
}

See Red Teaming Strategies for details.


redteam.Extractors

Extract system information for targeted attacks.

typescript
const {
  extractEntities, // Extract named entities from system
  extractMcpToolsInfo, // Extract MCP tool metadata
  extractSystemPurpose, // Extract system purpose and goal
} = redteam.Extractors;

Example:

typescript
import { redteam } from 'promptfoo';

const purpose = await redteam.Extractors.extractSystemPurpose({
  prompt: 'You are a helpful assistant...',
});

console.log(`System purpose: ${purpose}`);

redteam.Base.Plugin

Base class for custom red team plugins.

typescript
abstract class RedteamPluginBase {
  async run(params: {
    target: Prompt;
    injectVar?: string;
    conversationHistory?: Message[];
    options?: Record<string, unknown>;
  }): Promise<RedteamPluginResult>;
}

See Plugin Development Guide (below) for examples.


redteam.Base.Grader

Base class for custom red team graders.

typescript
abstract class RedteamGraderBase {
  async grade(params: { prompt: string; output: string; rubric: string }): Promise<GradingResult>;
}

Utilities

generateTable(evaluateTable, tableCellMaxLength?, maxRows?)

Format evaluation results as a display table.

typescript
export function generateTable(
  evaluateTable: EvaluateTable,
  tableCellMaxLength?: number,
  maxRows?: number,
): string;

Example:

typescript
import { evaluate, generateTable } from 'promptfoo';

const evalRecord = await evaluate(testSuite);
const table = await evalRecord.getTable();
console.log(generateTable(table));

isTransformFunction(value)

Check if a value is a transform function (for custom prompt/variable transforms).

typescript
export function isTransformFunction(value: unknown): value is TransformFunction;

Example:

typescript
import { isTransformFunction } from 'promptfoo';

const maybeTransform = (x) => x.toUpperCase();
console.log(isTransformFunction(maybeTransform)); // true

Advanced Patterns

Pattern 1: Programmatic Test Suite Generation

Dynamically generate test cases:

typescript
import { evaluate } from 'promptfoo';

const generateTests = (questions) => {
  return questions.map((q) => ({
    vars: { question: q },
    assert: [
      { type: 'contains', value: 'verified-fact' },
      {
        type: 'custom-llm-rubric',
        value: 'Is the answer accurate?',
      },
    ],
  }));
};

const evalRecord = await evaluate({
  prompts: ['Answer: {{ question }}'],
  providers: ['openai:chat:gpt-5.5'],
  tests: generateTests([
    'What is the capital of France?',
    'What is 2+2?',
    'What is photosynthesis?',
  ]),
});

Pattern 2: Custom Grading with Context

Use provider response data in assertions:

typescript
import { evaluate } from 'promptfoo';

const evalRecord = await evaluate({
  prompts: ['Analyze: {{ text }}'],
  providers: ['openai:chat:gpt-5.5'],
  tests: [
    {
      vars: { text: 'Sample text' },
      assert: [
        {
          type: 'javascript',
          value: (output, context) => {
            // Access token usage from provider response
            const tokens = context.providerResponse?.tokenUsage?.total || 0;
            return {
              pass: tokens < 100,
              reason: `Used ${tokens} tokens`,
            };
          },
        },
      ],
    },
  ],
});

Pattern 3: Batch Provider Testing

Load providers and test in parallel:

typescript
import { assertions, loadApiProviders } from 'promptfoo';

const providers = await loadApiProviders([
  'openai:chat:gpt-5.5',
  'anthropic:messages:claude-opus-4-7',
]);

const testCases = ['2+2=?', 'What is AI?'];

for (const provider of providers) {
  console.log(`Testing ${provider.id()}:`);
  for (const test of testCases) {
    const response = await provider.callApi(test);

    const result = await assertions.runAssertion({
      provider,
      assertion: { type: 'contains', value: 'answer' },
      test: { vars: { prompt: test } },
      providerResponse: response,
    });

    console.log(`  ${test}: ${result.pass ? '✓' : '✗'}`);
  }
}

Pattern 4: Cache Isolation for A/B Testing

Compare two model versions with isolated caches:

typescript
import { cache, evaluate } from 'promptfoo';

async function compareModels(oldVersion, newVersion, testSuite) {
  const oldEval = await cache.withCacheNamespace(`model-${oldVersion}`, () =>
    evaluate({
      ...testSuite,
      providers: [`openai:chat:${oldVersion}`],
    }),
  );

  const newEval = await cache.withCacheNamespace(`model-${newVersion}`, () =>
    evaluate({
      ...testSuite,
      providers: [`openai:chat:${newVersion}`],
    }),
  );

  const oldResults = await oldEval.toEvaluateSummary();
  const newResults = await newEval.toEvaluateSummary();

  return {
    oldPass: oldResults.stats.successes,
    newPass: newResults.stats.successes,
    improvement: newResults.stats.successes - oldResults.stats.successes,
  };
}

Type Definitions

All types are exported from promptfoo and available in TypeScript:

typescript
import type {
  // Main types
  EvaluateTestSuite,
  EvaluateOptions,
  EvaluateSummary,
  EvaluateResult,

  // Test configuration
  TestCase,
  AtomicTestCase,
  Assertion,
  AssertionSet,

  // Provider types
  ApiProvider,
  ProviderResponse,
  ProviderConfig,

  // Evaluation types
  GradingResult,
  AssertionValueFunctionContext,

  // Transform types
  TransformFunction,
  TransformContext,

  // Extension hooks
  BeforeAllExtensionHookContext,
  AfterEachExtensionHookContext,
  // ... and more
} from 'promptfoo';

API Stability

Stable APIs (Safe for production)

  • evaluate()
  • loadApiProvider(), loadApiProviders()
  • assertions.runAssertion(), assertions.runAssertions()
  • Cache namespace functions
  • Guardrails namespace

Experimental APIs (May change)

  • redteam.* (plugin/strategy system)
  • Trace data structures

Internal APIs (Don't use directly)

  • Functions not exported from main promptfoo module
  • Test utilities
  • CLI-specific functions

Troubleshooting

Cache issues?

typescript
// Clear cache if stale
import { cache } from 'promptfoo';
await cache.clearCache();

// Or disable for development
cache.disableCache();

Type errors with custom assertions?

typescript
import type { AssertionValueFunctionContext } from 'promptfoo';

// Type-safe context usage
value: (output, context: AssertionValueFunctionContext) => {
  // Full autocomplete and type checking
};

Need to debug evaluation flow?

typescript
// Enable verbose logging
process.env.LOG_LEVEL = 'debug';

// Or check trace data in assertions
const result = await assertions.runAssertion({
  // ...
  traceData: trace, // Access timing information
});

Examples Repository

Find runnable examples in the promptfoo examples directory.


See Also