Back to Promptfoo

Smoke Tests Plan

docs/plans/smoke-tests.md

0.121.948.2 KB
Original Source

Smoke Tests Plan

Comprehensive smoke test plan for the promptfoo CLI and library. This document serves as the checklist and specification for smoke tests that verify the built package works correctly across critical user flows.

Test location: test/smoke/ Run command: npm run test:smoke

Philosophy

  1. Test the built package (dist/), not source code
  2. No external API dependencies - use echo provider, local scripts, or mock servers
  3. Fast execution - target < 3 minutes total
  4. Cover critical paths - CLI commands, config loading, provider initialization

Running Smoke Tests

bash
# Run all smoke tests
npm run test:smoke

# Run individual test suites
npm run test:smoke:cli      # CLI binary tests
npm run test:smoke:eval     # Eval pipeline tests

Test Location

All smoke tests live in test/smoke/:

test/smoke/
├── cli.test.ts              # CLI command tests
├── eval.test.ts             # Eval pipeline tests
├── providers.test.ts        # Provider loading tests
├── configs.test.ts          # Config format tests
├── data-loading.test.ts     # Data source tests
├── fixtures/
│   ├── configs/             # Config format examples
│   ├── providers/           # Echo-based providers
│   ├── data/                # Test data files
│   └── assertions/          # Assertion scripts
└── scripts/
    └── run-all.sh           # Shell-based smoke tests

Test Checklist

1. CLI Binary Tests

1.1 Basic CLI Operations

#TestCommandVerifies
1.1.1Version outputpromptfoo --versionBinary executes, version
1.1.2Help outputpromptfoo --helpCommander parsing
1.1.3Subcommand helppromptfoo eval --helpSubcommand routing
1.1.4Unknown commandpromptfoo unknownxyzError handling
1.1.5Invalid configpromptfoo eval -c nonexistent.yamlFile not found error

1.2 Init Command

#TestCommandVerifies
1.2.1Init interactivepromptfoo init --no-interactiveProject scaffolding
1.2.2Init with examplepromptfoo init --example simple-cliExample download

1.3 Validate Command

#TestCommandVerifies
1.3.1Valid configpromptfoo validate -c valid.yamlValidation passes
1.3.2Invalid configpromptfoo validate -c invalid.yamlValidation errors
1.3.3Schema errorspromptfoo validate -c malformed.yamlSchema validation

1.4 Eval Command (Core)

#TestCommandVerifies
1.4.1Basic evalpromptfoo eval -c echo-config.yaml --no-cacheCore eval pipeline
1.4.2JSON outputpromptfoo eval -c config.yaml -o out.jsonJSON export
1.4.3YAML outputpromptfoo eval -c config.yaml -o out.yamlYAML export
1.4.4CSV outputpromptfoo eval -c config.yaml -o out.csvCSV export
1.4.5Max concurrencypromptfoo eval -c config.yaml --max-concurrency 1Concurrency control
1.4.6Repeatpromptfoo eval -c config.yaml --repeat 2Repeat runs
1.4.7Verbosepromptfoo eval -c config.yaml --verboseVerbose logging
1.4.8Env filepromptfoo eval -c config.yaml --env-file .envEnv loading

1.5 List/Show/Export Commands

#TestCommandVerifies
1.5.1List evalspromptfoo list evalsDatabase reads
1.5.2List datasetspromptfoo list datasetsDataset listing
1.5.3Show evalpromptfoo show <eval-id>Eval retrieval
1.5.4Export evalpromptfoo export <eval-id> -o out.jsonExport functionality

1.6 Cache Commands

#TestCommandVerifies
1.6.1Cache clearpromptfoo cache clearCache management

1.7 Exit Codes

#TestScenarioExpected Code
1.7.1All passAll assertions pass0
1.7.2Assertion failAssertion fails100
1.7.3Config errorInvalid config1
1.7.4Provider errorProvider fails1

1.8 Filter Flags

Test count and content filters that control which tests run.

1.8.1 Count-Based Filters
#TestCommandVerifies
1.8.1.1First N tests--filter-first-n 2 (with 5 tests)Takes first 2 tests only
1.8.1.2First N > total--filter-first-n 100 (with 5 tests)Returns all 5 tests
1.8.1.3First N zero--filter-first-n 0Returns 0 tests (no eval runs)
1.8.1.4Sample N tests--filter-sample 2 (with 5 tests)Returns exactly 2 tests (random)
1.8.1.5Sample > total--filter-sample 100 (with 5 tests)Returns all 5 tests
1.8.2 Pattern Filters
#TestCommandVerifies
1.8.2.1Pattern match--filter-pattern "user.*test"Only tests with matching desc
1.8.2.2Pattern case--filter-pattern "(?i)TEST"Case-insensitive regex works
1.8.2.3Pattern no match--filter-pattern "nonexistent123"Zero tests run
1.8.2.4Pattern special--filter-pattern "test\\.special"Regex escaping works
1.8.3 Metadata Filters
#TestCommandVerifies
1.8.3.1Metadata match--filter-metadata category=authOnly tests with metadata match
1.8.3.2Metadata partial--filter-metadata category=auPartial value match works
1.8.3.3Metadata array--filter-metadata tags=securityArray metadata value match
1.8.3.4Metadata no match--filter-metadata category=nonexistentZero tests run
1.8.3.5Metadata invalid--filter-metadata invalidError: must be key=value format
1.8.4 Provider Filters
#TestCommandVerifies
1.8.4.1Provider by ID--filter-providers "echo"Only echo provider runs
1.8.4.2Provider by label--filter-providers "Custom.*"Provider label regex match
1.8.4.3Provider multi--filter-providers "echo|custom"Multiple provider match
1.8.4.4Provider no match--filter-providers "nonexistent"No providers match, error/empty
1.8.4.5Filter targets--filter-targets "echo"Alias for --filter-providers
1.8.5 History-Based Filters
#TestCommandVerifies
1.8.5.1Filter failing file--filter-failing output.jsonRe-runs only failed tests
1.8.5.2Filter failing ID--filter-failing eval-abc123Re-runs failures by eval ID
1.8.5.3Filter errors file--filter-errors-only output.jsonRe-runs only error tests
1.8.5.4Filter errors ID--filter-errors-only eval-abc123Re-runs errors by eval ID
1.8.6 Combined Filters
#TestCommandVerifies
1.8.6.1Pattern + First N--filter-pattern "auth" --filter-first-n 2Filters apply in sequence
1.8.6.2Metadata + Sample--filter-metadata cat=x --filter-sample 1Metadata then sample
1.8.6.3Provider + Pattern--filter-providers echo --filter-pattern xProvider and test filtering

1.9 Variable and Prompt Flags

Flags that modify variables or prompt content.

#TestCommandVerifies
1.9.1Var single--var name=AliceVariable substitution
1.9.2Var multiple--var name=Alice --var age=30Multiple vars
1.9.3Var override--var name=Override (config has name=Bob)CLI overrides config
1.9.4Var invalid--var invalidError: must be key=value
1.9.5Prompt prefix--prompt-prefix "System: "Prefix prepended to prompts
1.9.6Prompt suffix--prompt-suffix "\nEnd."Suffix appended to prompts
1.9.7Prefix + suffix--prompt-prefix "A" --prompt-suffix "Z"Both applied

1.10 Execution Control Flags

Flags that control how tests execute.

#TestCommandVerifies
1.10.1Delay--delay 100 (with timing check)Delay between tests
1.10.2Delay zero--delay 0No delay (default)
1.10.3No cache--no-cacheCache disabled (already tested)
1.10.4No progress bar--no-progress-barProgress bar hidden
1.10.5No table--no-tableTable output suppressed
1.10.6Table max length--table-cell-max-length 50Cell truncation

1.11 Output and Metadata Flags

Flags that affect output and eval metadata.

#TestCommandVerifies
1.11.1Description--description "My test run"Description in output
1.11.2Multiple outputs-o out.json -o out.csvMultiple output files
1.11.3No write--no-writeResults not persisted to DB
1.11.4Share disabled--no-shareSharing disabled

1.12 Resume and Retry Flags

Flags for resuming or retrying evaluations.

#TestCommandVerifies
1.12.1Resume latest--resumeResumes latest incomplete eval
1.12.2Resume by ID--resume eval-abc123Resumes specific eval
1.12.3Retry errors--retry-errorsRetries errors from latest

2. Config Format Tests

2.1 YAML Configs

#TestConfigVerifies
2.1.1Basic YAMLconfig.yamlYAML parsing
2.1.2YAML with anchorsconfig-anchors.yamlYAML anchor/alias
2.1.3YML extensionconfig.yml.yml extension

2.2 JSON Configs

#TestConfigVerifies
2.2.1JSON configconfig.jsonJSON parsing

2.3 JavaScript Configs

#TestConfigExport StyleVerifies
2.3.1CJS class exportconfig.jsmodule.exports = ClassCJS class
2.3.2CJS object exportconfig.jsmodule.exports = { ... }CJS object
2.3.3CJS named exportconfig.jsmodule.exports.foo = ...CJS named
2.3.4CJS explicit extconfig.cjsmodule.exports = ....cjs extension
2.3.5ESM defaultconfig.mjsexport default { ... }ESM default
2.3.6ESM classconfig.mjsexport default classESM class
2.3.7ESM namedconfig.mjsexport const configESM named

2.4 TypeScript Configs

#TestConfigExport StyleVerifies
2.4.1TS defaultconfig.tsexport default { ... }TS transpile
2.4.2TS classconfig.tsexport default classTS class
2.4.3TS namedconfig.tsexport const configTS named
2.4.4TS with typesconfig.tsimplements ApiProviderType imports
2.4.5MTS extensionconfig.mtsexport default ....mts extension
2.4.6CTS extensionconfig.ctsmodule.exports = ....cts extension

3. Provider Tests

3.1 Built-in Providers (No API Key)

#TestProviderVerifies
3.1.1Echo providerechoBuilt-in echo
3.1.2Exec providerexec:echo "hello"Shell command

3.2 JavaScript Providers

#TestProvider ConfigExport StyleVerifies
3.2.1CJS classfile://provider.jsmodule.exports = ClassCJS class
3.2.2CJS functionfile://provider.js:callApimodule.exports.callApi = fnCJS named fn
3.2.3CJS explicitfile://provider.cjsmodule.exports = ....cjs extension
3.2.4ESM defaultfile://provider.mjsexport default classESM class
3.2.5ESM functionfile://provider.mjs:callApiexport function callApiESM named fn

3.3 TypeScript Providers

#TestProvider ConfigExport StyleVerifies
3.3.1TS default classfile://provider.tsexport default classTS class
3.3.2TS named functionfile://provider.ts:callApiexport function callApiTS named fn
3.3.3TS with interfacefile://provider.tsimplements ApiProviderTS interface

3.4 Python Providers

#TestProvider ConfigFunctionVerifies
3.4.1Default fnfile://provider.pycall_api()Python default
3.4.2Named fnfile://provider.py:custom_fncustom_fn()Python named
3.4.3Async fnfile://provider.py:async_fnasync def async_fn()Python async
3.4.4With contextfile://provider.pyUses context paramContext passing
3.4.5With optionsfile://provider.pyUses options paramOptions passing
3.4.6Token usagefile://provider.pyReturns tokenUsageToken tracking
3.4.7Error returnfile://provider.pyReturns errorError handling

3.5 Ruby Providers

#TestProvider ConfigFunctionVerifies
3.5.1Default fnfile://provider.rbcall_api()Ruby default
3.5.2Named fnfile://provider.rb:custom_fncustom_fn()Ruby named
3.5.3Hash returnfile://provider.rbHash outputRuby hash

3.6 Go Providers

#TestProvider ConfigFunctionVerifies
3.6.1Default fnfile://main.goCallApiGo default
3.6.2With go.modfile://main.goMulti-packageGo modules

3.7 HTTP Providers

#TestProvider ConfigVerifies
3.7.1Basic HTTPid: http://...HTTP provider
3.7.2HTTPSid: https://...HTTPS provider
3.7.3Body templatebody: { prompt: "{{prompt}}" }Body templating
3.7.4Transform responsetransformResponse: json.outputResponse transform
3.7.5Custom headersheaders: { X-Custom: value }Header passing

3.8 HTTP Auth Configurations

#TestAuth ConfigVerifies
3.8.1Bearer tokenauth: { type: bearer, token: ... }Bearer auth
3.8.2API key headerauth: { type: api_key, placement: header }API key header
3.8.3API key queryauth: { type: api_key, placement: query }API key query
3.8.4Basic authauth: { type: basic, username, password }Basic auth
3.8.5OAuth client credsauth: { type: oauth, grantType: client_credentials }OAuth CC
3.8.6OAuth passwordauth: { type: oauth, grantType: password }OAuth password
3.8.7Signature PEMsignatureAuth: { type: pem, privateKeyPath }PEM signature
3.8.8Signature JKSsignatureAuth: { type: jks, keystorePath }JKS signature
3.8.9Signature PFXsignatureAuth: { type: pfx, pfxPath }PFX signature
3.8.10mTLS certtls: { certPath, keyPath }Mutual TLS

4. Data Loading Tests

4.1 Vars Loading

#TestConfigSourceVerifies
4.1.1Inline varsvars: { key: value }YAML inlineDirect vars
4.1.2JSON filevars: file://data.jsonJSON fileJSON vars
4.1.3YAML filevars: file://data.yamlYAML fileYAML vars

4.2 Tests Loading

#TestConfigSourceVerifies
4.2.1Inline teststests: [{ vars: ... }]YAML inlineDirect tests
4.2.2CSV filetests: file://tests.csvCSV fileCSV parsing
4.2.3JSON filetests: file://tests.jsonJSON fileJSON tests
4.2.4JSONL filetests: file://tests.jsonlJSONL fileJSONL parsing
4.2.5YAML filetests: file://tests.yamlYAML fileYAML tests
4.2.6XLSX filetests: file://tests.xlsxExcel fileExcel parsing
4.2.7JS generatortests: file://tests.jsJS functionJS test gen
4.2.8TS generatortests: file://tests.tsTS functionTS test gen
4.2.9Python generatortests: file://tests.pyPython fnPython test gen
4.2.10Glob patterntests: tests/*.yamlGlobGlob expansion

4.3 Prompts Loading

#TestConfigSourceVerifies
4.3.1Inlineprompts: ["Hello {{name}}"]YAML inlineDirect prompt
4.3.2File refprompts: [file://prompt.txt]Text fileFile loading
4.3.3Glob patternprompts: prompts/*.txtGlobGlob prompts
4.3.4Exec promptprompts: [{ raw: "exec:..." }]ShellExecutable
4.3.5JSON chatprompts: [file://chat.json]JSON messagesChat format

5. Assertion Tests

5.1 Built-in Assertions

#TestAssertion TypeVerifies
5.1.1Containstype: containsString contains
5.1.2Not containstype: not-containsString not contains
5.1.3Equalstype: equalsExact match
5.1.4Starts withtype: starts-withPrefix match
5.1.5Regextype: regexRegex match
5.1.6Is JSONtype: is-jsonJSON validation
5.1.7Contains JSONtype: contains-jsonJSON subset
5.1.8JSON schematype: is-valid-json-schemaJSON schema
5.1.9Costtype: costCost threshold
5.1.10Latencytype: latencyLatency threshold
5.1.11Perplexitytype: perplexityPerplexity check

5.2 Script Assertions

#TestAssertion ConfigVerifies
5.2.1Inline JStype: javascript, value: "output.includes()"Inline JS
5.2.2JS filetype: javascript, value: file://assert.jsJS file
5.2.3JS named fntype: javascript, value: file://assert.js:fnJS named
5.2.4Inline Pythontype: python, value: "output.lower()"Inline Python
5.2.5Python filetype: python, value: file://assert.pyPython file
5.2.6Python namedtype: python, value: file://assert.py:checkPython named

5.3 Model-Graded Assertions (Config Validation Only)

#TestAssertion TypeVerifies
5.3.1Factualitytype: factualityConfig load
5.3.2Answer relevancetype: answer-relevanceConfig load
5.3.3Context relevancetype: context-relevanceConfig load
5.3.4LLM rubrictype: llm-rubricConfig load

6. Transform Tests

6.1 Response Transforms

#TestTransform ConfigVerifies
6.1.1String exprtransformResponse: "json.content"Expression eval
6.1.2JS filetransformResponse: file://transform.jsJS transform
6.1.3Named functiontransformResponse: file://transform.js:fnNamed transform

6.2 Prompt Transforms

#TestTransform ConfigVerifies
6.2.1Prompt functionprompt: file://prompt.jsPrompt fn
6.2.2Nunjucks filternunjucksFilters: file://filters.jsCustom filters

7. Feature Integration Tests

7.1 Provider Config Options

#TestConfigVerifies
7.1.1Provider with configproviders: [{ id: echo, config: { foo: bar } }]Config passing
7.1.2Provider with labelproviders: [{ id: echo, label: "My Echo" }]Label support
7.1.3Multiple providersproviders: [echo, echo]Multi-provider

7.2 DefaultTest

#TestConfigVerifies
7.2.1Inline defaultTestdefaultTest: { assert: [...] }Inline default
7.2.2File defaultTestdefaultTest: file://default.yamlFile default

7.3 Scenarios

#TestConfigVerifies
7.3.1Basic scenarioscenarios: [{ config: ..., tests: ... }]Scenario loading

8. Example Config Smoke Tests

Validate existing examples work with echo provider substitution:

#ExampleOriginal ProviderTest With
8.1examples/simple-testopenaiecho
8.2examples/simple-csvopenaiecho
8.3examples/json-outputopenaiecho
8.4examples/executable-promptsechoas-is
8.5examples/csv-metadataopenaiecho
8.6examples/jsonl-test-casesopenaiecho
8.7examples/javascript-assert-externalopenaiecho
8.8examples/nunjucks-custom-filtersopenaiecho
8.9examples/external-defaulttestopenaiecho
8.10examples/multishotopenaiecho

Implementation Priority

Phase 1: Foundation (Must Ship) - COMPLETE

  • 1.1.1-1.1.5: Basic CLI operations
  • 1.4.1-1.4.7: Basic eval with echo (output formats, flags)
  • 1.7.1-1.7.3: Exit codes
  • 2.1.1: YAML config
  • 3.1.1: Echo provider
  • 4.2.1: Inline tests
  • 1.2.1: Init command
  • 1.3.1-1.3.2: Validate command
  • 1.5.1-1.5.2: List commands
  • 1.6.1: Cache commands

Phase 2: Config & Provider Formats - PARTIAL

  • 2.2.1: JSON config
  • 2.3.1: CJS config with module.exports
  • 2.3.5: ESM config with export default
  • 2.3.2-2.3.4, 2.3.6-2.3.7: Other JS config variants
  • 2.4.1: TypeScript config with export default
  • 2.4.2-2.4.6: Other TS config variants
  • 3.2.1: CJS provider class
  • 3.2.4: ESM provider class
  • 3.2.2-3.2.3, 3.2.5: Other JS provider variants
  • 3.3.1: TypeScript provider class
  • 3.3.2-3.3.3: Other TS provider variants

Phase 3: Script Providers - PARTIAL

  • 3.4.1: Python provider (default call_api)
  • 3.4.2: Python provider named function
  • 3.4.3-3.4.7: Other Python provider variants
  • 3.5.1-3.5.3: Ruby provider variants
  • 3.6.1-3.6.2: Go provider variants
  • 3.1.2: Exec provider

Phase 4: Data & Assertions - PARTIAL

  • 4.2.2: CSV file tests
  • 4.2.3: JSON file tests
  • 4.2.4: JSONL file tests
  • 4.2.5: YAML file tests
  • 4.2.7: JS test generator
  • 4.3.2: File ref prompts
  • 4.2.6, 4.2.8-4.2.10: Other data loading formats (XLSX, TS/Python generators, glob)
  • 5.1.1: Contains assertion
  • 5.1.2: Not-contains assertion
  • 5.1.3: Equals assertion
  • 5.1.4: Starts-with assertion
  • 5.1.5: Regex assertion
  • 5.1.6: Is-JSON assertion
  • 5.1.7: Contains-json assertion
  • 5.2.1: Inline JavaScript assertion
  • 5.2.2: JavaScript file assertion
  • 5.2.5: Python file assertion
  • 5.1.8-5.1.11: Other built-in assertions (json-schema, cost, latency, perplexity)
  • 5.2.3-5.2.4, 5.2.6: Other script assertion variants

Phase 5: HTTP & Auth

  • 3.7.1-3.7.5: HTTP provider basics
  • 3.8.1-3.8.10: All auth configurations

Phase 6: Filter & Flag Tests - PARTIAL

High-value tests for CLI filter flags and execution options.

Priority 1 - Count/Pattern Filters:

  • 1.8.1.1: First N tests (--filter-first-n 2)
  • 1.8.1.2: First N > total (returns all)
  • 1.8.1.4: Sample N tests (--filter-sample 2)
  • 1.8.2.1: Pattern filter (--filter-pattern "user.*test")
  • 1.8.2.3: Pattern no match (0 tests)
  • 1.8.3.1: Metadata filter (--filter-metadata category=auth)
  • 1.8.3.2: Metadata partial match
  • 1.8.3.3: Metadata array match

Priority 2 - Provider Filters:

  • 1.8.4.1: Provider by ID (--filter-providers "echo")
  • 1.8.4.2: Provider by label regex
  • 1.8.6.1: Combined filters (pattern + first-n)

Priority 3 - Variable/Prompt Flags:

  • 1.9.1: Single var (--var name=Alice)
  • 1.9.2: Multiple vars
  • 1.9.3: Var precedence (test vars override --var)
  • 1.9.5: Prompt prefix
  • 1.9.6: Prompt suffix
  • 1.9.7: Prefix + suffix combined

Priority 4 - Output/Execution Flags:

  • 1.10.1: Delay between tests (--delay 100)
  • 1.10.5: No table output
  • 1.11.1: Description flag
  • 1.11.2: Multiple output files
  • 1.11.3: No write flag

Priority 5 - History-Based Filters (more complex):

  • 1.8.5.1: Filter failing from file
  • 1.8.5.3: Filter errors only from file
  • 1.12.1: Resume evaluation
  • 1.12.3: Retry errors

Phase 7: Integration & Polish - PARTIAL

  • 6.1.1: Transform response expression
  • 6.1.2-6.1.3: Other transform variants
  • 7.1.1: Provider with config options
  • 7.1.2-7.1.3: Other provider config variants
  • 7.2.1: DefaultTest feature
  • 7.2.2: File defaultTest
  • 7.3.1: Scenarios feature
  • 8.1-8.10: Example config tests

Phase 8: Advanced Features - PARTIAL

Advanced CLI features and assertion capabilities.

  • 1.4.8: Environment file loading (--env-file)
  • 1.4.4b: HTML output format
  • 4.3.1b: Multiple prompts (A/B testing)
  • 4.3.2b: Multiple file prompts (file:// references)
  • 5.1.2b: icontains assertion (case-insensitive)
  • 5.1.5b: Regex end-of-string pattern ($ anchor)
  • 5.3.1: Assertion weights
  • 7.4.1: Test threshold option (partial assertion passes)
  • 1.4.9: --grader flag for model-graded assertions
  • 1.12.1: Resume evaluation (--resume)
  • 1.12.3: Retry errors (--retry-errors)
  • 2.5.1: Config extends feature

Cross-Platform Testing

OS Matrix

#OSNode VersionsSpecial Considerations
9.1.1Ubuntu20, 22, 24Standard
9.1.2macOS20, 22, 24fsevents, path handling
9.1.3Windows20, 22, 24Path separators, shell commands

Script Language Matrix

#LanguageVersionsTests
9.2.1Python3.9, 3.11Python provider
9.2.2Ruby3.0, 3.3Ruby provider
9.2.3Go1.23Go provider

CI Integration

Add to .github/workflows/main.yml:

yaml
smoke-tests:
  name: Smoke Tests
  runs-on: ubuntu-latest
  needs: [build]
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: '20'
    - uses: actions/setup-python@v5
      with:
        python-version: '3.11'

    - name: Install dependencies
      run: npm ci

    - name: Build
      run: npm run build

    - name: Smoke Tests
      run: npm run test:smoke

Writing New Smoke Tests

Guidelines

  1. Use echo provider - No external API dependencies
  2. Keep fixtures minimal - Only what's needed to test the feature
  3. Test one thing - Each test should verify a single capability
  4. Include negative tests - Verify errors are handled correctly
  5. Document the test - Add entry to checklist above

Echo-Based Provider Template

python
# test/smoke/fixtures/providers/echo-provider.py
def call_api(prompt, options, context):
    """Echo provider that returns the prompt back."""
    return {
        "output": f"Echo: {prompt}",
        "tokenUsage": {"total": len(prompt), "prompt": len(prompt), "completion": 0}
    }
javascript
// test/smoke/fixtures/providers/echo-provider.js
module.exports = class EchoProvider {
  constructor(options) {
    this.id = options.id || 'echo-js';
  }
  id() {
    return this.id;
  }
  async callApi(prompt) {
    return { output: `Echo: ${prompt}` };
  }
};

Config Fixture Template

yaml
# test/smoke/fixtures/configs/basic.yaml
description: 'Smoke test - basic eval'
providers:
  - echo
prompts:
  - 'Hello {{name}}'
tests:
  - vars:
      name: World
    assert:
      - type: contains
        value: Hello
      - type: contains
        value: World

Key Findings

Important discoveries made during smoke test implementation that may inform future development:

Provider Named Function Syntax

The :functionName syntax (e.g., file://provider.py:custom_fn) is only supported for:

  • Python providers - file://provider.py:custom_fn works
  • Ruby providers - file://provider.rb:custom_fn works
  • Go providers - file://main.go:CustomFn works

It is NOT supported for JavaScript/TypeScript providers. JS/TS providers must export a class:

javascript
// Correct - class export
module.exports = class MyProvider {
  async callApi(prompt) {
    return { output: prompt };
  }
};

// NOT supported - named function export for providers
// module.exports.callApi = async (prompt) => { ... };

The :functionName syntax IS supported for JavaScript in other contexts:

  • Assertions: type: javascript, value: file://assert.js:checkFn
  • Transforms: transform: file://transform.js:transformFn
  • Test generators: tests: file://tests.js:generateTests

Assertion Behavior

contains-json Assertion

The contains-json assertion behaves differently based on whether a value is provided:

  • Without value: Checks that the output contains valid JSON somewhere
  • With value: Validates extracted JSON against a JSON Schema (not a subset match)
yaml
# Just check for valid JSON presence
- type: contains-json

# Validate against JSON Schema
- type: contains-json
  value:
    type: object
    required: [status, code]
    properties:
      status: { type: string }
      code: { type: number }

Scenarios Configuration

The scenarios[].config field expects an array of variable configurations, not a plain object:

yaml
# Correct
scenarios:
  - config:
      - vars:
          region: US
      - vars:
          region: EU
    tests:
      - vars: { name: Alice }

# Incorrect - will fail validation
scenarios:
  - config:
      region: US  # This is wrong
    tests:
      - vars: { name: Alice }

Exit Codes

The CLI uses specific exit codes:

Exit CodeMeaning
0Success - all tests passed
100Test failures - one or more assertions failed
1Error - configuration error, provider error, or other runtime error

Output JSON Structure

When exporting results to JSON (-o output.json), key paths:

  • results.prompts[].provider - Provider label/ID for each prompt
  • results.results[].success - Boolean indicating if all assertions passed
  • results.results[].response.output - The LLM output text
  • results.results[].provider.label - Provider label if configured
  • results.results[].gradingResult.componentResults[] - Individual assertion results

Echo Provider Behavior

The built-in echo provider returns the prompt exactly as-is. This is useful for:

  • Testing assertion logic without API calls
  • Verifying prompt template rendering
  • Testing data loading and variable substitution

To test JSON-related assertions, include JSON in the prompt itself:

yaml
prompts:
  - 'Response: {"status": "ok", "count": 42}'
tests:
  - assert:
      - type: contains-json

Test Isolation

Each smoke test file creates its own temporary output directory and cleans it up in afterAll. This ensures tests don't interfere with each other when run in parallel.

typescript
const OUTPUT_DIR = path.resolve(__dirname, '.temp-output-unique-name');

beforeAll(() => {
  fs.mkdirSync(OUTPUT_DIR, { recursive: true });
});

afterAll(() => {
  fs.rmSync(OUTPUT_DIR, { recursive: true, force: true });
});

Available Assertion Types

There is no ends-with assertion type. To check string suffixes, use the regex assertion with a $ anchor:

yaml
# Check if output ends with "42."
- type: regex
  value: '42\.$'

Available assertion types include: contains, contains-all, contains-any, icontains, icontains-all, icontains-any, equals, starts-with, regex, is-json, contains-json, is-html, is-xml, is-sql, javascript, python, and model-graded assertions.

Glob Patterns for Prompts

Glob patterns in prompts (e.g., prompts/*.txt) have path resolution issues when used with file:// prefix. Use explicit file references instead:

yaml
# Works - explicit file references
prompts:
  - file://../prompts/greeting.txt
  - file://../prompts/farewell.txt

# Has issues - glob pattern
prompts:
  - file://prompts/*.txt

Multiple Config Files Behavior

Using multiple -c flags doesn't deeply merge configs. Each config is processed separately with defaults for missing properties. The second config will get a default {{prompt}} prompt if prompts aren't specified.

To compose configs, use explicit file:// references within a single config or specify all required properties in each config file.

HTML Output Format

The HTML output uses lowercase <!doctype html> (valid HTML5) rather than uppercase <!DOCTYPE html>.


Bug Regression Tests (0.120.x)

Critical bugs identified in versions 0.120.0-0.120.3 that should have smoke test coverage. These bugs represent real issues users encountered after the major ESM migration.

10. ESM Migration Bugs (0.120.0)

The 0.120.0 release migrated from CommonJS to ESM, causing several regressions:

10.1 Module Loading

#BugIssueDescriptionProposed Test
10.1.1CJS fallback#6501.js files with CJS syntax failed to loadLoad .js provider with module.exports
10.1.2require() resolution#6468require() calls in custom code failedProvider that uses require() internally
10.1.3process.mainModule#6606Inline transforms using process.mainModule.require brokeInline JS assertion with process.mainModule
10.1.4ESM import resolution#6509Various import paths failed in ESM contextTS provider with complex imports

10.2 Provider Path Resolution

#BugIssueDescriptionProposed Test
10.2.1Relative path from CWD#6503Provider paths resolved from CWD instead of config directoryConfig in subdir with ./provider.js path
10.2.2Python wrapper path#6500Python wrapper.py path resolution failedPython provider from different working directory
10.2.3Python provider path#6465Python provider module path resolution issuesPython provider with relative imports

10.3 Cache & Config

#BugIssueDescriptionProposed Test
10.3.1Cache init failure#6467Cache failed to initialize: "KeyvFile is not a constructor"Run eval with cache enabled (default)
10.3.2maxConcurrency ignored#6526maxConcurrency in config.yaml was ignored, only CLI workedConfig with defaultTest.options.maxConcurrency

10.4 CLI Issues

#BugIssueDescriptionProposed Test
10.4.1Eval hanging#6460Eval command hung indefinitely, never completingBasic eval completes in reasonable time
10.4.2View premature exit#6460promptfoo view exited immediately after starting(Not testable in smoke tests - requires server)
10.4.3Logger write-after-end#6511Winston "write after end" errors during shutdownMultiple evals in sequence don't cause errors

10.5 Language Providers

#BugIssueDescriptionProposed Test
10.5.1Go provider broken#6506Go provider wrapper failed after ESM migrationGo provider basic functionality
10.5.2Ruby provider broken#6506Ruby provider wrapper failed after ESM migrationRuby provider basic functionality

11. Version 0.120.1-0.120.2 Bugs

11.1 Database & Migrations

#BugIssueDescriptionProposed Test
11.1.1Drizzle migrations path#6573DB migrations not found when using npm/npx(Tested implicitly - eval writes to DB)

11.2 Parsing Issues

#BugIssueDescriptionProposed Test
11.2.1JSON chat parsing#6568Incorrect parsing of JSON vs non-JSON chat messagesJSON array prompt parses correctly
11.2.2Gemini empty contents#6580Gemini provider crashed on empty content responses(Requires Gemini - not for smoke tests)
11.2.3Context-recall preamble#6566Preamble text in context-recall parser caused failures(Requires grading provider - not for smoke tests)

11.3 Assertion Improvements

#BugIssueDescriptionProposed Test
11.3.1is-sql error messages#6565Unhelpful error messages for is-sql whitelist violationsis-sql with whitelist shows clear error

11.4 HTTP Provider

#BugIssueDescriptionProposed Test
11.4.1Body parsing#6484HTTP provider body parsing had edge casesHTTP provider with complex body template

12. Key Regression Test Implementations

Priority smoke tests to add based on critical 0.120.x bugs:

12.1 Module Loading Regression Tests

yaml
# Test 10.1.1: CJS provider with module.exports still works
# Fixture: test/smoke/fixtures/providers/cjs-module-exports.js
providers:
  - file://providers/cjs-module-exports.js

# Test 10.1.3: Inline JS with process.mainModule (requires Node.js CJS compat)
tests:
  - assert:
      - type: javascript
        value: |
          // This should not throw even though process.mainModule is undefined in ESM
          const output = context.output || '';
          return output.includes('test');

12.2 Provider Path Resolution Tests

yaml
# Test 10.2.1: Provider path relative to config file, NOT cwd
# Config at: test/smoke/fixtures/subdir/config-relative-provider.yaml
# Provider at: test/smoke/fixtures/subdir/local-provider.js
# Run from: test/smoke/ (different directory than config)
providers:
  - file://./local-provider.js # Should resolve relative to config, not cwd

12.3 Config Option Tests

yaml
# Test 10.3.2: maxConcurrency in config.yaml is respected
defaultTest:
  options:
    maxConcurrency: 1
providers:
  - echo
prompts:
  - 'Test {{n}}'
tests:
  - vars: { n: 1 }
  - vars: { n: 2 }
  - vars: { n: 3 }
# Verify tests run sequentially (timing check)

Implementation Priority for Regression Tests

Phase 1: High Priority (Add Now)

  • 10.1.1: CJS module.exports provider loading
  • 10.2.1: Provider path resolution from config directory
  • 10.3.2: maxConcurrency in config file
  • 11.2.1: JSON chat message parsing

Phase 2: Medium Priority

  • 10.1.3: Inline JS with process.mainModule shim
  • 10.5.1: Go provider basic test
  • 10.5.2: Ruby provider basic test

Phase 3: Lower Priority (Complex Setup)

  • 10.2.2: Python wrapper path from different CWD
  • 11.4.1: HTTP provider complex body parsing

Implemented Tests Summary

Current smoke test coverage:

Test FileTestsCategory
cli.test.ts18CLI commands, init, validate
eval.test.ts12Core eval pipeline
providers.test.ts14Provider loading (JS/TS/Python)
configs.test.ts8Config format parsing
data-loading.test.ts13Data sources (CSV, JSON, YAML)
filters-flags.test.ts22Filter flags and CLI options
advanced-features.test.ts10Advanced features (env, delay, HTML)
output-and-assertions.test.ts15Assertion types and output formats
Total100