Back to Prompt Optimizer

Structured Compare Calibration Summary

docs/workspace/compare-evaluation-analysis/structured-compare-calibration/latest/summary.md

2.10.21.2 KB
Original Source

Structured Compare Calibration Summary

  • generatedAt: 2026-03-22T10:44:18.102Z
  • outputRoot: D:\Dev\myProject\prompt-optimizer\docs\workspace\compare-evaluation-analysis\structured-compare-calibration\latest
CaseKindScoretargetVsBaselinetargetVsReferenceGapstopRecommendationExpectation Match
live-basic-system-boundary-controllive75improvedminorcontinueexploratory
synthetic-medical-latent-trigger-overfitsynthetic35regressedmajorreview3/5
synthetic-ecommerce-schema-no-model-worshipsynthetic40regressedminorreview6/6
synthetic-legal-flat-not-unclearsynthetic50flatnonecontinue3/3
synthetic-teaching-overfit-regressionsynthetic30regressedmajorreview6/6
synthetic-hiring-replica-semantic-instabilitysynthetic65improvednonereview4/4

Notes

  • synthetic cases 用来检验 judge / synthesis 的提示词边界。
  • live case 用来观察真实 target/teacher 执行结果在 structured compare 下是否能收敛成合理结论。
  • 每个 case 子目录内都保存了 compare request、compare result、rewrite input / output,以及完整 LLM 调用日志。