site/docs/guides/evaluate-json.md
Getting an LLM to output valid JSON can be a difficult task. There are a few failure modes:
This guide explains some eval techniques for testing your model's JSON quality output by ensuring that specific fields are present in the outputted object. It's useful for tweaking your prompt and model to ensure that it outputs valid JSON that conforms to your desired specification.
Before proceeding, ensure you have a basic understanding of how to set up test cases and assertions. Find more information in the Getting Started guide and the Assertions & Metrics documentation.
Let's say your language model outputs a JSON object like the following:
{
"color": "Yellow",
"location": "Guatemala"
}
You want to create assertions that specifically target the values of color and location. Here's how you can do it.
To ensure that your language model's output is valid JSON, you can use the is-json assertion type. This assertion will check that the output is a valid JSON string and optionally validate it against a JSON schema if provided.
Here's an example of how to use the is-json assertion without a schema:
assert:
- type: is-json
If you want to validate the structure of the JSON output, you can define a JSON schema. Here's an example of using the is-json assertion with a schema that requires color to be a string and countries to be a list of strings:
prompts:
- "Output a JSON object that contains the keys `color` and `countries`, describing the following object: {{item}}"
tests:
- vars:
item: Banana
assert:
// highlight-start
- type: is-json
value:
required: ["color", "countries"]
type: object
properties:
color:
type: string
countries:
type: array
items:
type: string
// highlight-end
This will ensure that the output is valid JSON that contains the required fields with the correct data types.
To assert on specific fields of a JSON output, use the javascript assertion type. This allows you to write custom JavaScript code to perform logical checks on the JSON fields.
Here's an example configuration that demonstrates how to assert that color equals "Yellow" and countries contains "Ecuador":
prompts:
- "Output a JSON object that contains the keys `color` and `countries`, describing the following object: {{item}}"
tests:
- vars:
item: Banana
assert:
- type: is-json
# ...
// highlight-start
# Parse the JSON and test the contents
- type: javascript
value: JSON.parse(output).color === 'yellow' && JSON.parse(output).countries.includes('Ecuador')
// highlight-end
If you don't want to add JSON.parse to every assertion, you can add a transform under test.options that parses the JSON before the result is passed to the assertions:
tests:
- vars:
item: Banana
// highlight-start
options:
transform: JSON.parse(output)
// highlight-end
assert:
- type: is-json
# ...
- type: javascript
// highlight-start
# `output` is now a parsed object
value: output.color === 'yellow' && output.countries.includes('Ecuador')
// highlight-end
For model-graded assertions such as similarity and rubric-based evaluations, preprocess the output to extract the desired field before running the check. The transform directive can be used for this purpose, and it applies to the entire test case.
Here's how you can use transform to assert the similarity of location to a given value:
tests:
- vars:
item: banana
// highlight-start
options:
transform: JSON.parse(output).countries
// highlight-end
assert:
- type: contains-any
value:
- Guatemala
- Costa Rica
- India
- Indonesia
- type: llm-rubric
value: is someplace likely to find {{item}}
See the full example in Github.
By using JavaScript within your assertions, you can perform complex checks on JSON outputs, including targeting specific fields. The transform can be used to tailor the output for similarity checks.
promptfoo is free and open-source software. To install promptfoo and get started, see the getting started guide.
For more on different assertion types available, see assertions documentation. You might also be interested in Evaluating RAG pipelines guide, which provides insights into evaluating retrieval-augmented generation applications.