docs/plans/ink-cli-ui-pr-split-audit.md
PR #6611 is a 199-file, +45,925 line PR introducing an Ink-based interactive CLI UI. While well-architected, landing this as a single PR presents significant review and rollback risks. This document provides a critical analysis and actionable PR split strategy.
| Category | Files | Lines Added |
|---|---|---|
| src/ui/ (new) | 123 | ~35,000 |
| test/ui/ (new) | 39 | ~8,000 |
| docs/design/ (new) | 11 | ~6,000 |
| Commands (modified) | 7 | ~1,100 |
| Other (modified) | 19 | ~1,800 |
| Total | 199 | ~45,900 |
The current structure has a problem: you cannot land ANY feature UI without first landing ALL infrastructure.
Dependency Chain:
Feature UIs (auth, cache, list, menu, share)
└── render.ts
└── interactiveCheck.ts
└── constants.ts
└── components/shared/*
└── hooks/*
└── utils/*
Impact: The "Phase 1: Infrastructure" PR in the existing landing strategy is a prerequisite for everything else, but it's ~3,000 lines of code that does nothing user-visible by itself.
Each command modification imports from src/ui/*:
| Command | Lines Changed | UI Imports |
|---|---|---|
| eval.ts | +396/-27 | evalRunner, shouldUseInkUI |
| list.ts | +220/-122 | listRunner, shouldUseInkUI |
| auth.ts | +201/-108 | authRunner, shouldUseInkUI |
| cache.ts | +107/-5 | cacheRunner, shouldUseInkUI |
| init.ts | +29 | initRunner |
| share.ts | +44 | shareRunner |
| menu.ts | +116 (new) | menuRunner |
Problem: You cannot split "just the auth UI" without also landing the infrastructure it depends on.
Tests are organized 1:1 with source files:
Implication: Each PR should include corresponding tests, which inflates PR sizes.
src/ui/init/ - 39 files (~15,000 lines)
test/ui/init/ - 7 files (~2,500 lines)
This is the largest single feature and is self-contained, making it a good candidate for deferral.
The eval command changes (+396 lines) and EvalApp/EvalScreen components are the most important user-facing feature. However, they depend on:
Total: ~60 files just for the eval UI.
Instead of "infrastructure first", land complete vertical slices with feature flags.
Size: ~2,500 lines | Review Time: 1-2 hours
Files:
src/ui/constants.ts
src/ui/interactiveCheck.ts
src/ui/render.ts
src/ui/noninteractive/progress.ts
src/ui/noninteractive/textOutput.ts
src/ui/noninteractive/index.ts
src/envars.ts (add PROMPTFOO_ENABLE_INTERACTIVE_UI)
test/ui/noninteractive.test.ts
test/ui/render.test.ts
Why First:
Environment Variable:
PROMPTFOO_ENABLE_INTERACTIVE_UI=true # Enable Ink UI (opt-in)
Size: ~5,000 lines | Review Time: 2-3 hours
Files:
src/ui/hooks/*
src/ui/utils/*
src/ui/components/shared/*
test/ui/hooks/*
test/ui/utils/*
test/ui/components/shared/*
Why Second:
Size: ~12,000 lines | Review Time: 4-6 hours
Files:
src/ui/contexts/*
src/ui/machines/evalMachine.ts
src/ui/evalBridge.ts
src/ui/EvalApp.tsx
src/ui/evalRunner.tsx
src/ui/components/eval/*
src/ui/components/table/*
src/commands/eval.ts (modified)
src/evaluator.ts (modified)
test/ui/eval/*
test/ui/machines/*
test/ui/components/table/*
test/ui/integration/*
Why Third:
Feature Flag:
promptfoo eval --interactive # Opt-in
PROMPTFOO_EXPERIMENTAL_INK=1 promptfoo eval # Env-based opt-in
Size: ~6,000 lines | Review Time: 2-3 hours
Files:
src/ui/auth/*
src/ui/cache/*
src/ui/list/*
src/ui/menu/*
src/ui/share/*
src/commands/auth.ts (modified)
src/commands/cache.ts (modified)
src/commands/list.ts (modified)
src/commands/menu.ts (new)
src/commands/share.ts (modified)
test/ui/auth/*
test/ui/cache/*
test/ui/list/*
test/ui/menu/*
test/ui/share/*
Why Fourth:
Size: ~18,000 lines | Review Time: 6-8 hours
Files:
src/ui/init/*
src/commands/init.ts (modified)
src/redteam/commands/init.ts (modified)
test/ui/init/*
Why Last:
Feature Flag:
PROMPTFOO_EXPERIMENTAL_INIT=1 promptfoo init
Size: ~1,500 lines | Review Time: 1 hour
Files:
src/ui/redteamGenerate/*
src/redteam/commands/generate.ts (modified)
test/ui/redteamGenerate/*
Why Separate:
If the team prefers fewer PRs:
Combines PR 1-3 above.
Pros:
Cons:
Combines PR 4-6 above.
Pros:
Cons:
┌─────────────────────┐
│ PR 1: Foundation │
│ (interactiveCheck, │
│ render, constants) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ PR 2: Utils/Hooks │
│ (shared components, │
│ formatting, etc.) │
└──────────┬──────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌──────▼──────┐ ┌───────▼───────┐ ┌──────▼──────┐
│ PR 3: Eval │ │ PR 4: Aux UIs │ │ PR 6: RT Gen│
│ (core eval │ │ (auth, cache, │ │ (small, │
│ experience)│ │ list, menu) │ │ isolated) │
└──────┬──────┘ └───────────────┘ └─────────────┘
│
┌──────▼──────┐
│ PR 5: Init │
│ (wizard UIs,│
│ state mach)│
└─────────────┘
| PR | Size | Risk | Review Complexity | Rollback Difficulty |
|---|---|---|---|---|
| 1 | 2.5k | Low | Easy | Trivial (no user impact) |
| 2 | 5k | Low | Medium | Trivial (no user impact) |
| 3 | 12k | Medium | High | Moderate (feature-flagged) |
| 4 | 6k | Low | Medium | Easy (feature-flagged) |
| 5 | 18k | Medium | High | Moderate (feature-flagged) |
| 6 | 1.5k | Low | Easy | Trivial |
For each PR:
PROMPTFOO_EXPERIMENTAL_*)Go with the 6-PR strategy. Here's why:
Timeline: