docs/memory/next_actions.md
Updated: 2025-10-17 Priority: Testing & Validation → Metrics Collection
Purpose: テストスイート実行環境を構築
Dependencies: なし Owner: PM Agent + DevOps
Steps:
# Option 1: Docker環境でセットアップ (推奨)
docker compose exec workspace sh
pip install pytest pytest-cov scipy
# Option 2: 仮想環境でセットアップ
python -m venv .venv
source .venv/bin/activate
pip install pytest pytest-cov scipy
Success Criteria:
Estimated Time: 30分
Purpose: 品質保証層の実動作確認
Dependencies: pytest環境セットアップ完了 Owner: Quality Engineer + PM Agent
Commands:
# 全テスト実行
pytest tests/pm_agent/ -v
# マーカー別実行
pytest tests/pm_agent/ -m unit # Unit tests
pytest tests/pm_agent/ -m integration # Integration tests
pytest tests/pm_agent/ -m hallucination # Hallucination detection
pytest tests/pm_agent/ -m performance # Performance tests
# カバレッジレポート
pytest tests/pm_agent/ --cov=. --cov-report=html
Expected Results:
Hallucination Detection: ≥94%
Token Budget Compliance: 100%
Confidence Accuracy: >85%
Error Recurrence: <10%
All Tests: PASS
Estimated Time: 1時間
Purpose: 実際のワークフローでデータ蓄積
Steps:
初回データ収集:
初回週次分析:
python scripts/analyze_workflow_metrics.py --period week
結果レビュー:
Success Criteria:
Estimated Time: 1週間 (自動記録)
Purpose: 実験的ワークフローの検証
Steps:
Experimental Variant設計:
experimental_eager_layer3 (Medium tasksで常にLayer 3)80/20配分実装:
Allocation:
progressive_v3_layer2: 80% # Current best
experimental_eager_layer3: 20% # New variant
20試行後の統計分析:
python scripts/ab_test_workflows.py \
--variant-a progressive_v3_layer2 \
--variant-b experimental_eager_layer3 \
--metric tokens_used
判定:
Success Criteria:
Estimated Time: 2週間
Multi-agent Confidence Aggregation:
Predictive Error Detection:
Adaptive Budget Allocation:
Cross-session Learning Patterns:
mindbase Vector Search Optimization:
Reflexion Pattern Refinement:
Evidence Requirement Automation:
Continuous Learning Loop:
Goal: 品質保証層確立
Metrics:
- All tests pass: 100%
- Hallucination detection: ≥94%
- Token efficiency: 60% avg
- Error recurrence: <10%
Goal: データ蓄積開始
Metrics:
- Tasks recorded: ≥20
- Data quality: Clean (no null errors)
- Weekly report: Generated
- Insights: ≥3 actionable findings
Goal: 科学的ワークフロー改善
Metrics:
- Trials per variant: ≥20
- Statistical significance: p < 0.05
- Winner identified: Yes
- Implementation: Promoted or deprecated
Testing:
tests/pm_agent/ (2,760行)pytest.ini (configuration)conftest.py (fixtures)Metrics:
docs/memory/workflow_metrics.jsonl (initialized)docs/memory/WORKFLOW_METRICS_SCHEMA.md (spec)Analysis:
scripts/analyze_workflow_metrics.py (週次分析)scripts/ab_test_workflows.py (A/Bテスト)Week 1 (Oct 17-23):
- Day 1-2: pytest環境セットアップ
- Day 3-4: テスト実行 & 検証
- Day 5-7: 問題修正 (if any)
Week 2-3 (Oct 24 - Nov 6):
- Continuous: メトリクス自動記録
- Week end: 初回週次分析
Week 3-4 (Nov 7 - Nov 20):
- Start: Experimental variant起動
- Continuous: 80/20 A/B testing
- End: 統計分析 & 判定
Month 2-3 (Dec - Jan):
- Advanced features implementation
- Integration enhancements
Technical Blockers:
Risks:
Mitigation:
External Dependencies:
Internal Dependencies:
None blocking: すべて準備完了 ✅
Next Session Priority: pytest環境セットアップ → テスト実行
Status: Ready to proceed ✅