Ruby assertions

The ruby assertion allows you to provide a custom Ruby method to validate the LLM output.

A variable named output is injected into the context. The method should return true if the output passes the assertion, and false otherwise. If the method returns a number, it will be treated as a score.

Example:

yaml

assert:
  - type: ruby
    value: output[5..9] == 'Hello'

You may also return a number, which will be treated as a score:

yaml

assert:
  - type: ruby
    value: Math.log10(output.length) * 10

Multiline functions

Ruby assertions support multiline strings:

yaml

assert:
  - type: ruby
    value: |
      # Insert your scoring logic here...
      if output == 'Expected output'
        return {
          'pass' => true,
          'score' => 0.5,
        }
      end
      return {
        'pass' => false,
        'score' => 0,
      }

Using test context

A context object is available in the Ruby method. Here is its type definition:

ruby

# TraceSpan
{
  'spanId' => String,
  'parentSpanId' => String | nil,
  'name' => String,
  'startTime' => Integer,  # Unix timestamp in milliseconds
  'endTime' => Integer | nil,  # Unix timestamp in milliseconds
  'attributes' => Hash | nil,
  'statusCode' => Integer | nil,
  'statusMessage' => String | nil
}

# TraceData
{
  'traceId' => String,
  'spans' => Array[TraceSpan]
}

# AssertionValueFunctionContext
{
  # Raw prompt sent to LLM
  'prompt' => String | nil,

  # Test case variables
  'vars' => Hash[String, String | Object],

  # The complete test case
  'test' => Hash,  # Contains keys like "vars", "assert", "options"

  # Log probabilities from the LLM response, if available
  'logProbs' => Array[Float] | nil,

  # Configuration passed to the assertion
  'config' => Hash | nil,

  # The provider that generated the response
  'provider' => Object | nil,  # ApiProvider type

  # The complete provider response
  'providerResponse' => Object | nil,  # ProviderResponse type

  # OpenTelemetry trace data (when tracing is enabled)
  'trace' => TraceData | nil
}

For example, if the test case has a var example, access it in Ruby like this:

yaml

tests:
  - description: 'Test with context'
    vars:
      example: 'Example text'
    assert:
      - type: ruby
        value: 'context["vars"]["example"] in output'

External .rb

To reference an external file, use the file:// prefix:

yaml

assert:
  - type: ruby
    value: file://relative/path/to/script.rb
    config:
      outputLengthLimit: 10

You can specify a particular method to use by appending it after a colon:

yaml

assert:
  - type: ruby
    value: file://relative/path/to/script.rb:custom_assert

You can also specify a class method on some class or module in the file:

yaml

assert:
  - type: ruby
    value: file://relative/path/to/script.rb:Validators::Format.check_length

If no method is specified, it defaults to get_assert.

This file will be called with an output string and an AssertionValueFunctionContext object (see above). It expects that either a bool (pass/fail), float (score), or GradingResult will be returned.

Here's an example assert.rb:

ruby

require 'json'

# Default function name
def get_assert(output, context)
  puts 'Prompt:', context['prompt']
  puts 'Vars', context['vars']['topic']

  # This return is an example GradingResult hash
  {
    'pass' => true,
    'score' => 0.6,
    'reason' => 'Looks good to me',
  }
end

# Custom function name
def custom_assert(output, context)
  output.length > 10
end

This is an example of an assertion that uses data from a configuration defined in the assertion's YML file:

ruby

def get_assert(output, context)
  output.length <= context.fetch('config', {}).fetch('outputLengthLimit', 0)
end

You can also return nested metrics and assertions via a GradingResult object:

ruby

{
  'pass' => true,
  'score' => 0.75,
  'reason' => 'Looks good to me',
  'componentResults' => [{
    'pass' => output.downcase.include?('bananas'),
    'score' => 0.5,
    'reason' => 'Contains banana',
  }, {
    'pass' => output.downcase.include?('yellow'),
    'score' => 0.5,
    'reason' => 'Contains yellow',
  }]
}

GradingResult types

Here's a Ruby type definition you can use for the GradingResult object:

ruby

# GradingResult
{
  'pass' => Boolean,  # Can also use 'pass_' if 'pass' conflicts with Ruby keywords
  'score' => Float,
  'reason' => String,
  'componentResults' => Array[GradingResult] | nil,  # Component results (optional)
  'namedScores' => Hash[String, Float] | nil  # Appear as metrics in the UI (optional)
}

:::tip Snake case support Ruby snake_case fields are automatically mapped to camelCase:

pass_ → pass (or just use "pass" as a hash key)
named_scores → namedScores
component_results → componentResults
tokens_used → tokensUsed :::

Using trace data

When tracing is enabled, OpenTelemetry trace data is available in the context['trace'] object. This allows you to write assertions based on the execution flow:

ruby

def get_assert(output, context)
  # Check if trace data is available
  unless context['trace']
    # Tracing not enabled, skip trace-based checks
    return true
  end

  # Access trace spans
  spans = context['trace']['spans']

  # Example: Check for errors in any span
  error_spans = spans.select { |s| s.fetch('statusCode', 0) >= 400 }
  if error_spans.any?
    return {
      'pass' => false,
      'score' => 0,
      'reason' => "Found #{error_spans.length} error spans"
    }
  end

  # Example: Calculate total trace duration
  if spans.any?
    duration = spans.map { |s| s.fetch('endTime', 0) }.max - spans.map { |s| s['startTime'] }.min
    if duration > 5000  # 5 seconds
      return {
        'pass' => false,
        'score' => 0,
        'reason' => "Trace took too long: #{duration}ms"
      }
    end
  end

  # Example: Check for specific operations
  api_calls = spans.select { |s| s['name'].downcase.include?('http') }
  if api_calls.length > 10
    return {
      'pass' => false,
      'score' => 0,
      'reason' => "Too many API calls: #{api_calls.length}"
    }
  end

  true
end

Example YAML configuration:

yaml

tests:
  - vars:
      query: "What's the weather?"
    assert:
      - type: ruby
        value: |
          # Ensure retrieval happened before response generation
          if context['trace']
            spans = context['trace']['spans']
            retrieval_span = spans.find { |s| s['name'].include?('retrieval') }
            generation_span = spans.find { |s| s['name'].include?('generation') }

            if retrieval_span && generation_span
              return retrieval_span['startTime'] < generation_span['startTime']
            end
          end
          true

Overriding the Ruby binary

By default, promptfoo will run ruby in your shell. Make sure ruby points to the appropriate executable.

If a ruby binary is not present, you will see a "ruby: command not found" error.

To override the Ruby binary, set the PROMPTFOO_RUBY environment variable. You may set it to a path (such as /path/to/ruby) or just an executable in your PATH (such as ruby).

Negation

Use not-ruby to invert the final pass/fail result while preserving the returned score. Numeric scores are still compared against threshold before the result is inverted:

yaml

assert:
  - type: not-ruby
    value: output.include?('error')

Other assertion types

For more info on assertions, see Test assertions.