cookbook/data_labeling/_17_llm_as_judge/README.md
Score a generated output against criteria. The same machinery as labeling - input is the (prompt, response) pair, output is a structured score - but applied to evaluating models rather than producing training labels.
basic.py — single 1-5 score on overall quality.single_rubric.py — explicit multi-criterion rubric, per-criterion
scores plus an overall.with_rationale.py — score plus a one-sentence rationale._05_text_pairwise_preference/).python cookbook/data_labeling/_17_llm_as_judge/basic.py
python cookbook/data_labeling/_17_llm_as_judge/single_rubric.py
python cookbook/data_labeling/_17_llm_as_judge/with_rationale.py
Requires OPENAI_API_KEY.