bot/docs/vikingbot-phase3-outcome-validation-with-openviking-server.md
Author: OpenViking Team Status: Draft Date: 2026-04-30
本文用于验证 Vikingbot feedback observability Phase 3 当前已落地的 response outcome 链路,重点覆盖:
response_outcome_evaluatedpositive_feedback 或 negative_feedbackreaskedresponse_outcomesresponse_outcome_evaluated 作为 analytics-only 事件,不会泄漏到 OpenAPI 对外响应或用户可见 channel本阶段仍然是最小规则版 outcome evaluator,不是完整的离线 judge 或模型评审链路。
本文适用于以下启动方式:
OPENVIKING_CLI_CONFIG_FILE=ov_conf/ovcli.conf openviking-server --with-bot --config ov_conf/ov.conf
在这种模式下:
/bot/v1因此本文默认验证地址形如:
http://127.0.0.1:30300/bot/v1本文优先验证真实代理路径,而不是直接访问 http://127.0.0.1:18790。
截至 2026-04-30,本轮已完成的验证分为三类。
已通过如下定向测试:
./.venv/bin/python -m pytest -o addopts='' bot/tests/test_outcome_evaluator.py
./.venv/bin/python -m pytest -o addopts='' bot/tests/test_agent_loop_outcome.py
./.venv/bin/python -m pytest -o addopts='' bot/tests/test_langfuse_outcome_metadata.py
./.venv/bin/python -m pytest -o addopts='' bot/tests/test_openapi_auth.py -k feedback
对应验证点:
resolvedthumb_up 能产出 positive_feedbackthumb_down 能产出 negative_feedbackreaskedsession.metadata["response_outcomes"]OutboundEventType.RESPONSE_OUTCOME_EVALUATEDresponse_outcome_evaluated event 与 response_outcome_label score当前实现已明确保证:
response_outcome_evaluated 是 analytics-only 事件/chat 与 /feedback 对外返回中不会新增暴露 response_outcome_evaluated 事件流本阶段建议至少完成以下两个真实代理路径闭环:
/chat -> /feedback,验证 outcome 为 positive_feedback 或 negative_feedback/chat -> follow-up user turn,验证上一条 assistant response outcome 为 reasked这两个闭环合起来,足以验证当前 Phase 3 最核心的“显式 + 隐式” outcome 评估链路。
当前 Phase 3 使用最小规则版 evaluator,规则来源为:
session.metadata["feedback_events"]当前规则为:
thumb_down -> negative_feedbackthumb_up -> positive_feedbackreaskedresolvedfollow_up_without_feedbackfollow_up当前写入的 outcome payload 至少包含:
response_idresolved_in_one_turnreask_within_10mclarification_turnsfollow_up_without_feedbackoutcome_labelevaluated_atevidenceOPENVIKING_CLI_CONFIG_FILE=ov_conf/ovcli.conf openviking-server --with-bot --config ov_conf/ov.conf
预期日志至少包含类似内容:
Bot API proxy enabled, forwarding to http://127.0.0.1:18790
Starting vikingbot gateway...
Vikingbot gateway started (PID: ...)
OpenViking HTTP Server is running on 127.0.0.1:30300
curl -sS http://127.0.0.1:30300/bot/v1/health
预期返回 HTTP 200。
positive_feedback先发送一条真实聊天请求:
curl -sS -X POST "http://127.0.0.1:30300/bot/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"session_id": "phase3-outcome-feedback-session",
"user_id": "phase3-outcome-user",
"message": "请简单回复一句:用于验证 phase3 positive outcome"
}'
预期返回至少包含:
{
"session_id": "phase3-outcome-feedback-session",
"response_id": "<response_id>",
"message": "..."
}
拿到 response_id 后,提交显式反馈:
curl -sS -X POST "http://127.0.0.1:30300/bot/v1/feedback" \
-H "Content-Type: application/json" \
-d '{
"session_id": "phase3-outcome-feedback-session",
"response_id": "<response_id>",
"feedback_type": "thumb_up",
"feedback_text": "helpful"
}'
预期 /feedback 返回类似:
{
"accepted": true,
"response_id": "<response_id>",
"session_id": "phase3-outcome-feedback-session",
"feedback_type": "thumb_up",
"feedback_delay_sec": 1.234,
"timestamp": "..."
}
然后检查对应 session JSONL 首行 metadata,预期至少包含:
{
"metadata": {
"feedback_events": [
{
"response_id": "<response_id>",
"feedback_type": "thumb_up",
"feedback_text": "helpful"
}
],
"response_outcomes": {
"<response_id>": {
"response_id": "<response_id>",
"outcome_label": "positive_feedback",
"resolved_in_one_turn": true,
"reask_within_10m": false
}
}
}
}
如需验证负向路径,可对另一条新 response 提交:
curl -sS -X POST "http://127.0.0.1:30300/bot/v1/feedback" \
-H "Content-Type: application/json" \
-d '{
"session_id": "phase3-outcome-feedback-session-neg",
"response_id": "<another_response_id>",
"feedback_type": "thumb_down",
"feedback_text": "not helpful"
}'
预期 response_outcomes["<another_response_id>"].outcome_label == "negative_feedback"。
reasked先创建一条新的对话响应:
curl -sS -X POST "http://127.0.0.1:30300/bot/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"session_id": "phase3-outcome-reask-session",
"user_id": "phase3-outcome-user",
"message": "请简单回复一句:用于验证 phase3 reasked"
}'
记录返回中的 response_id=<response_id_1>。
在 10 分钟内,对同一个 session_id 再发送一条 follow-up user message:
curl -sS -X POST "http://127.0.0.1:30300/bot/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"session_id": "phase3-outcome-reask-session",
"user_id": "phase3-outcome-user",
"message": "我还是没明白,请再解释一下"
}'
当前实现会在处理这条新 user turn 前,先评估上一条 assistant response 的 outcome。
随后检查 session JSONL 首行 metadata,预期至少包含:
{
"metadata": {
"response_outcomes": {
"<response_id_1>": {
"response_id": "<response_id_1>",
"outcome_label": "reasked",
"reask_within_10m": true,
"resolved_in_one_turn": false,
"clarification_turns": 1,
"follow_up_without_feedback": false
}
}
}
}
这里的关键点不是第二次 /chat 的返回内容,而是上一条 response 的 outcome 是否已被持久化为 reasked。
这一步要求已启用 Langfuse。
完成 5.3 或 5.4 后,在 Langfuse UI 中找到对应 trace / generation,重点检查:
response_outcome_evaluatedresponse_outcome_label原因说明:
/chat 请求结束之后预期至少包含:
response_outcome_evaluatedresponse_outcome_labelpositive_feedback、negative_feedback、reasked对于显式正向反馈路径,典型 event 预期类似:
{
"name": "response_outcome_evaluated",
"metadata": {
"response_id": "<response_id>",
"outcome_label": "positive_feedback",
"response_outcome_evaluated": {
"response_id": "<response_id>",
"outcome_label": "positive_feedback"
}
}
}
对应 score 预期类似:
{
"name": "response_outcome_label",
"value": "positive_feedback",
"data_type": "CATEGORICAL"
}
对于 follow-up 路径,典型 event 预期类似:
{
"name": "response_outcome_evaluated",
"metadata": {
"response_id": "<response_id_1>",
"outcome_label": "reasked",
"response_outcome_evaluated": {
"response_id": "<response_id_1>",
"outcome_label": "reasked",
"reask_within_10m": true
}
}
}
对应 score 预期类似:
{
"name": "response_outcome_label",
"value": "reasked",
"data_type": "CATEGORICAL"
}
当前实现要求:
/bot/v1/chat 正常返回用户可见回复/bot/v1/feedback 只返回 feedback ackresponse_outcome_evaluated 不应出现在 OpenAPI 对外响应体中因此应确认:
/chat 响应中没有新增 response_outcome_evaluated 事件对象/feedback 响应中只有 accepted、response_id、session_id、feedback_type、feedback_delay_sec、timestamp 等反馈确认字段如果以下条件全部满足,可以认为当前 Phase 3 最小实现验证通过:
/bot/v1/chat 仍能稳定返回 response_idthumb_up 后,session JSONL metadata.response_outcomes[response_id].outcome_label == "positive_feedback"thumb_down 后,session JSONL metadata.response_outcomes[response_id].outcome_label == "negative_feedback"reaskedresponse_outcome_evaluated event,且 observation 上可看到 response_outcome_label scoreresponse_outcome_evaluated/bot/v1/health 不通优先检查:
openviking-server --with-botov_conf/ov.conf 中配置一致18790 是否被旧的 vikingbot 进程占用/feedback 成功,但 session 里没有 response_outcomes优先检查:
response_id 是否确实来自同一 session_idfeedback_eventsreasked优先检查:
session_idresponse_idthumb_up 或 thumb_down 覆盖为更强信号优先检查:
bot-langfuseov_conf/ov.conf 中 Langfuse 配置是否生效response_idmetadata.response_outcomes截至当前,Phase 3 应使用如下口径描述:
response_outcome_evaluated 的最小规则版实现已落地metadata.response_outcomes