Back to Fresh

E2E Test Migration — Composable Scenarios

docs/internal/e2e-test-migration-design.md

0.3.530.7 KB
Original Source

E2E Test Migration — Composable Scenarios

Scope: every file under crates/fresh-editor/tests/e2e/* (227 files). Owner: TBD.

1. Goal

Replace the imperative e2e harness with a single composable scenario framework whose tests are values rather than scripts. Every e2e file migrates to one of twelve scenario types. The framework covers rendering, LSP, filesystem, mouse, and animations — not just pure state — by layering observables, not by carving them out.

A scenario is:

rust
Scenario { description, context, actions, expectation }

The same value drives three consumers without any extra wiring:

   ┌─────────────────────────────────────────────────────────┐
   │   Scenario  (ScenarioContext + Vec<InputEvent> + Obs)   │
   └─────────────────────────────────────────────────────────┘
        │                    │                     │
        ▼                    ▼                     ▼
   regression test     proptest generator     shadow-model check
   (one example)       (sample / shrink)      (editor ≡ reference)

That triple-leverage is the whole reason to do this work.

2. Why this beats the existing e2e regime

A migrated test produces three artifacts to today's one:

ArtifactToday (imperative e2e)After migration
Regression checkyesyes
Proptest seed (corpus-guided generation)nofree
Shadow-model differential checknofree
Shrinkable counterexample on failurenofree (proptest shrinks Vec<InputEvent>)
Serializable for regression file / CI artifactnofree
Replayable across editor versions / branchesnofree
Mutation-test targetnofree
Cross-feature property checknofree (the corpus is the domain)
CI dashboard signalpanic-string parsetyped JSON

The write cost stays roughly constant; the read count multiplies. This is what makes migrating the previously-deferred categories (Class B viewport, modal UI, LSP, filesystem, rendering) worth doing — each one joins three drivers simultaneously.

Structural wins beyond leverage:

  • No keymap coupling. Tests reference Action::ToUpperCase, not KeyCode::Char('u') + ALT. Shortcut moves stop breaking tests.
  • No render coupling for state tests. Zero harness.render() calls in scenarios that don't assert on layout.
  • No screen-scraping for logic. Assertions go through typed observables; theme/render churn stops breaking logic tests.
  • Refactor freedom. Production internals can rename freely; only EditorTestApi and the shadow step change.
  • Less flakiness. No terminal IO, no async race, no render timing in scenarios that don't model them. The flake surface shrinks with the imperative surface.

3. What's already in place

The framework foundation has shipped. See docs/internal/scenario-migration-status.md for the forward-looking plan and scenario-migration-findings.md for the running log of behavioral asymmetries surfaced during migration.

  • crates/fresh-editor/src/test_api.rsEditorTestApi trait with the full per-phase observable surface (dispatch/dispatch_seq, buffer + cursor + selection, viewport top byte / hardware cursor / gutter / visible byte range, modal snapshot, workspace observables, mouse dispatch, is_modified).
  • crates/fresh-editor/tests/common/scenario/ — runners for BufferScenario, LayoutScenario, ModalScenario, WorkspaceScenario, PersistenceScenario, TemporalScenario, InputScenario, TerminalIoScenario. LspScenario and StyleScenario exist as honest skeletons whose runners panic with the precise prerequisite (see §10.1).
  • A growing body of tests/semantic/** migrations spanning buffer / cursor / selection, multi-cursor, undo (incl. bulk-edit + save-point boundary), auto-indent (incl. bracket expansion), case conversion, indent / dedent, sort lines, duplicate line, toggle comment, unicode cursor + grapheme handling, paste (round-trip Copy/Paste), search-modal flows, workspace buffer-count, dabbrev word completion, and the issue-numbered regressions for #210, #629, #1147, #1258, #1305, #1566, #1574, #1577, #1697, plus the bulk-edit-after-save undo corruption.
  • Action is the production input alphabet (Serialize).
  • ScenarioFailure is Serialize + Deserialize.
  • The tests/semantic/** lint forbids reaching into fresh::app::*, fresh::model::*, or fresh::view::*; only fresh::test_api and crate::common::harness::EditorTestHarness are reachable.

Production bugs and behavioral asymmetries have been found and fixed (or pinned) along the way — see docs/internal/scenario-migration-findings.md for the catalogue.

4. Data-model lockdown (Phase 1)

Prerequisite for everything else. One PR, mostly mechanical.

  1. Derive Serialize/Deserialize on BufferTheorem, TraceTheorem, LayoutTheorem.
  2. Replace &'static str with String (or Cow<'static, str>) on the same.
  3. Lift BehaviorFlags, filename (= language), TerminalSize into the struct as fields. Delete the runner overloads (assert_buffer_theorem_with_*).
  4. Rename *Theorem*Scenario (the Lean ambition is dropped; "scenario" is more accurate for what these are).
  5. Promote EvaluatedState (property.rs:23) to the canonical BufferState observable type.
  6. Add a ShadowModel trait skeleton + a BufferShadow impl that delegates to the live editor (no-op differential — proves the trait + corpus loop work end-to-end before any real shadow ships).
  7. CI job: dump the corpus to JSON; fail on schema-breaking diffs with no version bump.

Acceptance: every existing semantic test continues to pass; corpus JSON exists; BufferShadow runs the corpus and reports zero disagreements.

5. Composable scenario architecture

rust
pub struct Scenario<Obs: Observable> {
    pub description: String,
    pub context:     ScenarioContext,
    pub actions:     Vec<InputEvent>,
    pub expectation: Obs,
}

pub struct ScenarioContext {
    pub buffer:    BufferContext,         // initial_text, behavior, language, terminal
    pub workspace: Option<WorkspaceContext>,
    pub fs:        Option<VirtualFs>,
    pub lsp:       Option<LspScript>,
    pub plugins:   Option<PluginScript>,
    pub theme:     Option<Theme>,
    pub clock:     Option<MockClock>,
}

/// Anything the runner can extract from a live editor and assert on.
pub trait Observable: Serialize + DeserializeOwned + PartialEq {
    fn extract(api: &mut dyn EditorTestApi) -> Self;
}

pub fn check_scenario<Obs: Observable>(s: Scenario<Obs>)
    -> Result<(), ScenarioFailure>;

InputEvent is the top-level alphabet. It's a superset of Action:

rust
pub enum InputEvent {
    Action(Action),                  // existing 600-variant editor alphabet
    Mouse(MouseEvent),               // Click(x,y), Drag(start,end), Wheel(dx,dy)
    Compose(ComposeSeq),             // dead keys / IME
    OpenPrompt(PromptKind),          // for ModalScenario
    FilterPrompt(String),
    ConfirmPrompt,
    CancelPrompt,
    AdvanceClock(Duration),          // for TemporalScenario
    LspMessage(LspIncoming),         // server → client injection
    FsExternalEdit(PathBuf, String), // for auto_revert tests
    Wait(WaitCondition),             // semantic wait, never wall-clock sleep
}

The seven non-Action variants are the price of full coverage. Each is a typed event the runner dispatches deterministically. No variant is a KeyCode — even mouse events project through the current RenderSnapshot, not through crossterm.

Each scenario type from §6 is a type alias / specialization:

rust
pub type BufferScenario       = Scenario<BufferState>;
pub type LayoutScenario       = Scenario<RenderSnapshot>;
pub type ModalScenario        = Scenario<(BufferState, ModalState)>;
pub type WorkspaceScenario    = Scenario<(BufferState, WorkspaceState)>;
pub type PersistenceScenario  = Scenario<(BufferState, FsState)>;
pub type LspScenario          = Scenario<(BufferState, LspTraffic)>;
pub type StyleScenario        = Scenario<StyledFrame>;
pub type InputScenario        = Scenario<RenderSnapshot>;
pub type TemporalScenario     = Scenario<Vec<RenderSnapshot>>;
pub type TerminalIoScenario   = Scenario<RoundTripGrid>;

The runner is a single entry point parameterized by Obs; the specializations exist for ergonomic constructors and for proptest-strategy specialization, not because the runner branches.

Observable is the same interface shadow models implement (§8).

6. Scenario taxonomy — covering every e2e

6.1 The ten scenario types

TypePrimary observableFiles (~)
BufferScenariotext + cursors + selection50
LayoutScenarioRenderSnapshot (viewport, gutter, hw cursor)32
ModalScenarioprompt/palette/picker/menu state43
WorkspaceScenariosplits, tabs, dock layout, buffer list19
PersistenceScenarioVirtualFs + session/recovery state23
LspScenarioscripted LSP exchange + buffer29
StyleScenarioStyledFrame (cell role × theme)12
InputScenariomouse/composition events as data7
TemporalScenariotimed sequence of frames (MockClock)3
TerminalIoScenarioANSI bytes via vt100 round-trip7

Total ≈ 225; some files are dual-category. Unique e2e file count is 227.

6.2 Per-category file mapping

Representative, not exhaustive. Full file lists belong in the per-phase tickets.

BufferScenario (~50) — pure text/cursor/selection. Done: case_conversion, sort_lines, indent_dedent, smart_home, duplicate_line, toggle_comment, unicode_cursor, undo_redo, selection, auto_pairs, save_state, emacs_actions. Pending: basic, movement, paste, shift_backspace, triple_click, block_selection, multibyte_characters, smart_editing, tab_indent_selection, select_to_paragraph, document_model, goto_matching_bracket, multicursor, undo_redo_marker_roundtrip, undo_bulk_edit_after_save, issue_1288_word_select_whitespace, issue_1566_arrow_selection, issue_1697_ctrl_d_after_search, search_selection_on_punctuation, overlay_extend_to_line_end, search_navigation_after_move.

LayoutScenario (~32) — viewport scroll, soft-wrap, gutter, hardware cursor row/col. Unblocked by RenderSnapshot (§7.1). Files: issue_1147_wrapped_line_nav, scroll_clearing, scroll_wrapped_reach_last_line, scrolling, line_wrap_* (5 files), line_number_bugs, search_center_on_scroll, search_*_stall_after_wrap, hanging_wrap_indent, horizontal_scrollbar, issue_1502_word_wrap_squished, issue_1574_*_scroll, virtual_line*, popup_wrap_indent, margin, vertical_rulers, memory_scroll_leak, side_by_side_diff_*, markdown_compose*, redraw_screen, tab_scrolling, folding, issue_1571_fold_indicator_lag, issue_1568_session_fold_restore, issue_779_after_eof_shade, issue_1790_compose_wrap_highlight, test_scrollbar_keybinds_cursor.

ModalScenario (~43) — adds ModalState to the observable and OpenPrompt/FilterPrompt/ConfirmPrompt/CancelPrompt/ MenuSelect to InputEvent. Files: command_palette, file_browser, file_explorer, action_popup_global, prompt, prompt_editing, popup_selection, menu_bar, menu_*_bleed, explorer_*, live_grep, search, search_replace, lsp_code_action_modal, lsp_completion_*, dabbrev_completion, status_bar_message_click, update_notification, sudo_save_prompt, save_nonexistent_directory, settings, settings_* (multiple), keybinding_editor, unicode_prompt_bugs, issue_1718_settings_search_utf8_panic, preview_lsp_popup_focus, cursor_under_popup, toggle_bars.

WorkspaceScenario (~19) — adds WorkspaceState to context and observable. Splits and tabs are addressable as SplitId/TabId. Files: buffer_groups, buffer_lifecycle, buffer_settings_commands, multi_file_opening, preview_tabs, split_focus_tab_click, split_tabs, split_view, split_view_expectations, split_view_markdown_compose, tab_config, tab_drag, copy_buffer_path, issue_1540_tab_click_focus, position_history* (4 files).

PersistenceScenario (~23) — adds VirtualFs to context (an in-memory FS the editor reads/writes through a fake adapter) and FsState as observable. Files: auto_revert, encoding, external_file_save_as_tab, file_permissions, hot_exit_*, large_file_*, on_save_actions, recovery, save_as_language_detection, server_session_lifecycle, session_hot_exit, slow_filesystem, stdin_input, symlinks, unnamed_buffer_persistence, workspace, open_folder, lifecycle, bash_profile_editing, binary_file, save_nonexistent_directory (dual with Modal), undo_bulk_edit_after_save (dual with Buffer).

LspScenario (~29) — adds LspScript: an ordered list of expected client-to-server messages and pre-written server-to-client responses. The fake server matches messages by shape, replies on cue, and records traffic for assertion. Files: lsp and 26 lsp_* files; language_features_e2e; universal_lsp; inline_diagnostics; issue_1572_inlay_hint_drift; issue_1573_format_buffer. hot_exit_recovery_lsp_sync is dual (Persistence + LSP).

StyleScenario (~12) — pulls a StyledFrame via the §7 RenderSnapshot → StyledFrame projection (theme + role table) and asserts on cell roles + colors via Inspect::{Cell, Row, Column, Region, FullFrame}. Subsumes today's theme_screenshots byte snapshots with a diffable JSON form. Files: theme, theme_screenshots, blog_showcases, cursor_style_rendering, crlf_rendering, syntax_highlighting_coverage, syntax_highlighting_embedded_offset, syntax_language_case, glob_language_detection, config_language_selector, csharp_language_coherence, warning_indicators, issue_1554_scrollbar_theme_color, issue_1577_unicode_width, issue_1598_shebang_detection, issue_779_after_eof_shade.

InputScenario (~7) — extends InputEvent with Mouse, Compose, KeyChord. Mouse coordinates project to (line, byte) via the current RenderSnapshot, not through crossterm. Files: mouse, capslock_shortcuts, altgr_shift, csi_u_session_input, issue_1620_split_terminal_click_panic, locale, tab_drag (dual with Workspace).

TemporalScenario (~3) — adds MockClock to context and AdvanceClock(Duration) to InputEvent. Expectation is a Vec<RenderSnapshot> taken after each clock tick. Files: animation, flash, status_bar_config.

TerminalIoScenario (~7) — projects StyledFrame through the real escape-sequence emitter, then through vt100 back to a typed grid; asserts on the round-trip grid. Catches escape emission bugs without committing to specific byte sequences. Files: ansi_cursor, terminal, terminal_close, terminal_resize, terminal_split_focus_live, rendering, redraw_screen (dual). The harness already does most of this through render_real / render_real_incremental; the migration formalizes it into a scenario type.

Dropped scenario types — PluginScenario, GuiScenario. The plugin scenarios (~5 files) and gui.rs (1 file) were originally in scope but have been removed: their production hooks (plugin runtime test entry, wgpu/winit test API) are heavy and the test volumes don't justify the cost. Plugin test claims that reduce to buffer state can fold into BufferScenario with an optional context field if/when needed; gui.rs stays imperative — its editor-level claims migrate to BufferScenario / LayoutScenario, and the GUI-specific bits (font fallback, IME) are properly imperative.

6.3 Cross-cutting observables

Some files exercise more than one subsystem. Composition is direct — the scenario carries both context fields and the expectation type pairs both observables:

FileCategoriesHow it composes
lsp_code_action_modal.rsLSP + Modalcontext carries LspScript; expectation includes ModalState
hot_exit_recovery_lsp_sync.rsPersistence + LSPcontext carries VirtualFs + LspScript
tab_drag.rsWorkspace + Inputcontext carries WorkspaceState; actions include Mouse::Drag
issue_1554_scrollbar_theme_color.rsLayout + Styleobservable is (RenderSnapshot, StyledFrame)

The runner does not branch on type; the scenario's Observable implementation knows how to extract everything it cares about.

7. Rendering inside the framework

Rendering is not a separate test regime. The pipeline factors into four pure-ish layers; each layer has a scenario type; tests target the highest layer they care about and stop there.

7.1 The four rendering layers

        EditorState
             │  layout(width, height)
             ▼
       RenderSnapshot       ← LayoutScenario  (theme-free)
             │  style(Theme, RoleTable)
             ▼
        StyledFrame         ← StyleScenario   (role-tagged cells)
             │  emit(Capabilities, EmitState)
             ▼
        AnsiStream          ← (rarely tested directly)
             │  vt100 round-trip
             ▼
       RoundTripGrid        ← TerminalIoScenario

Each arrow is a function. None of these layers exists as a named public type today; building them is the bulk of the rendering-side work.

TypeWhereApprox LOC
RenderSnapshotcrates/fresh-editor/src/test_api.rs300
StyledFramesame80
RoundTripGridsame60
Layer functionssrc/view/render_layers.rs (refactored from existing)~500 net

The refactor does not rewrite the renderer. It splits the existing render() body into three named functions:

rust
fn layout(state: &EditorState, dim: TerminalDim) -> RenderSnapshot;
fn style(snapshot: &RenderSnapshot, theme: &Theme, roles: &RoleTable) -> StyledFrame;
fn emit(frame: &StyledFrame, caps: &Capabilities) -> AnsiStream;

Today's render() is the composition. Production stays unchanged (it composes them in one call); tests call them individually.

7.2 What each rendering scenario type catches

TypeCatchesDoesn't catch
LayoutScenarioviewport reconciliation, wrap math, gutter widths, hw cursor row/col, popup placement, scrollbar geometrycolors, glyph choice, escape correctness
StyleScenariotheme contrast, role-to-color mapping, modifier flags, syntax-highlight color regressionsterminal-emulator quirks
TerminalIoScenarioescape emission bugs, optimization regressions (e.g., redundant SGR resets), incremental redraw correctnessterminal-side bugs (xterm vs kitty)
TemporalScenarioanimation frame correctness, fade/flash duration, blink phase, scroll smoothingwall-clock drift

Together these cover everything except actual terminal-emulator and GPU-driver behavior, which are correctly outside the editor's responsibility. (gui.rs, the wgpu/winit front-end, stays imperative — see §6.2.)

7.3 Visual regression as a StyleScenario

A StyleScenario with Inspect::FullFrame and expected: StyledFrame loaded from a JSON snapshot file. Diffs are structural (cell (x,y) changed role from Selection to Normal, fg #abc to #def). Snapshot regeneration is a CLI flag on the test runner. Today's theme_screenshots.rs byte-for-byte pipeline is deleted.

7.4 Animations as TemporalScenario

rust
TemporalScenario {
    description: "Flash banner fades over 250ms".into(),
    context: ScenarioContext {
        clock: Some(MockClock::epoch()),
        ..Default::default()
    },
    actions: vec![
        InputEvent::Action(Action::ShowFlash("saved".into())),
        InputEvent::AdvanceClock(Duration::from_millis(50)),
        InputEvent::AdvanceClock(Duration::from_millis(50)),
        InputEvent::AdvanceClock(Duration::from_millis(150)),
    ],
    expectation: vec![
        snapshot_t0_with_banner,
        snapshot_t50_partially_faded,
        snapshot_t100_more_faded,
        snapshot_t250_no_banner,
    ],
}

Requires one production hook: Editor reads time through a Clock trait, default-impl uses the system clock, test-impl uses MockClock. ~30 LOC of production change, gated behind the existing #[cfg(any(test, feature = "test-api"))].

7.5 Layered shadows

Each layer admits its own shadow:

LayerShadowCatches
stepreference editorlogic bugs
layoutnaive wrap algorithmwrap regressions, viewport drift
stylerole-table-driven projectiontheme regressions, role-to-color mismatches
emitminimal escape emitterredundant escapes, incorrect cursor positioning

Each shadow runs on every applicable scenario in the corpus. The naive wrap shadow alone would have caught issue_1502 and several line_wrap_* regressions before they shipped — uniform proptest never finds them because the failing inputs are specific (double-width chars at exactly column width-1); the shadow finds them on the first scenario that hits the case.

8. Shadow model framework

One trait, multiple impls, every applicable scenario auto-checked.

rust
pub trait ShadowModel {
    /// Subset of `EditorTestApi` this shadow can simulate. The
    /// runner skips scenarios whose context references subsystems
    /// the shadow doesn't claim to handle.
    fn supports(&self) -> ShadowCapabilities;

    fn dispatch(&mut self, event: &InputEvent);

    fn extract<O: Observable>(&self) -> O;
}

pub struct ShadowCapabilities {
    pub buffer:    bool,
    pub workspace: bool,
    pub fs:        bool,
    pub lsp:       bool,
    pub layout:    bool,   // can produce RenderSnapshot
    pub style:     bool,   // can produce StyledFrame
}

The differential test:

rust
#[test]
fn corpus_agrees_with_buffer_shadow() {
    let shadow = BufferShadow::new();
    for scenario in corpus::iter().filter(|s| BufferShadow::handles(s)) {
        check_scenario_against_shadow(&scenario, &shadow)
            .expect("shadow disagreement");
    }
}

Adding a new shadow:

  1. Implement ShadowModel for the alternate semantics or alternate algorithm.
  2. Declare which scenario types it supports via ShadowCapabilities.
  3. The corpus-wide differential test picks it up automatically.

Shadows live in tests/common/shadows/:

ShadowSupportsPurpose
BufferShadowbufferreference editor; catches actions.rs / state.rs class bugs
LayoutShadowbuffer, layoutnaive wrap algorithm; catches wrap-table regressions
StyleShadowlayout, stylerole-driven projection from RenderSnapshot to StyledFrame
RopeShadowbuffertext in Vec<u8> not the production rope; catches rope bugs
MultiCursorShadowbuffernaive cursor merge; cross-checks production merge
UndoShadowbuffersnapshot-stack undo; cross-checks action-trace undo

Today's tests/shadow_model_*.rs files become ShadowModel impls and are deleted from the bespoke test files (the corpus loop subsumes them).

9. Corpus-guided proptest

Once Phase 1 lands and the corpus exists, build a proptest strategy that samples scenario prefixes from the corpus and generates random tails. Run as a soak job in CI.

Counterexamples write tests/semantic/regressions/bug_NNNN.json files automatically. The next CI run loads them as permanent scenarios. No source change per regression.

This thread runs in parallel with the migration phases and starts paying off immediately — it does not block any phase.

10. Implementation roadmap

Phase 1 is on the critical path. Phases 2–12 are independent and parallelizable; ordering below is by ROI.

10.1 Status (live)

#PhaseStatusProduction hook
1Data-model lockdownlandednone beyond EditorTestApi extensions
2RenderSnapshot + LayoutScenariolanded (minimal RenderSnapshot: viewport, hw cursor, gutter); LayoutShadow differential liveEditorTestApi-only
3ModalScenariolanded (real ModalState from PopupManager)EditorTestApi::modal_snapshot
4StyleScenarioskeletonneeds style() extracted from render() ‹real production refactor›
5LspScenarioskeletonneeds fake LSP transport adapter at LspManager boundary ‹real production refactor›
6PersistenceScenariolanded (real-FS via harness temp dir; routes through the existing model::filesystem::FileSystem trait + StdFileSystem impl)none beyond EditorTestHarness::temp_dir_path; an in-memory FileSystem impl would replace temp dir for speed but adds zero coverage
7WorkspaceScenariolanded (real WorkspaceState from buffer map)EditorTestApi::buffer_count/active_buffer_path/buffer_paths
8TerminalIoScenariolanded (real RoundTripGrid via existing vt100 parser)EditorTestHarness::vt100_cursor_position
9InputScenariolanded minimal (mouse Click(Left) routes through Editor::handle_mouse)EditorTestApi::dispatch_mouse_click
10TemporalScenariolanded (AdvanceClock(d) calls harness.advance_time(d) which advances the existing services::time_source::TestTimeSource the editor already reads)none — the TimeSource trait + TestTimeSource already exist
11PluginScenariodropped (low test volume vs heavy production hook); fold into BufferScenario if neededn/a
12GuiScenariodropped (gui.rs stays imperative; editor-level claims covered by BufferScenario/LayoutScenario)n/a

10.2 What "landed" means here

For phases 2/3/6/7/8/9: the runner is real, the data model is real, JSON round-trips through a gating test, and an end-to-end proof in tests/semantic/phase_proofs.rs exercises the runner against the live editor. Migrations of existing e2e files proceed file-by-file from this baseline.

Each test-side accessor added to EditorTestApi is gated behind #[cfg(any(test, feature = "test-api"))] per the original §2.1 contract — additive, never removed, never reachable from production binaries.

10.3 What's blocked on production refactors

Two phases (4, 5) and one phase's optional form (11) need real changes to production code, not just additive test-side accessors:

  • Phase 4 (StyleScenario) needs render() factored into named layout / style / emit functions so the cell-role × theme projection is invocable from tests in isolation.
  • Phase 5 (LspScenario) needs a transport-level seam in LspManager so a fake adapter can intercept JSON-RPC. The existing tests/common/fake_lsp.rs is a Bash subprocess that the real LspManager connects to over stdin/stdout — usable but flaky and not script-aware.

Three phases that appeared blocked turned out not to be — production already had the right traits and only needed the scenario runner wired through:

  • Phase 6 (PersistenceScenario) uses the existing model::filesystem::FileSystem trait. Today's runner uses the real-FS impl (StdFileSystem) under a per-harness tempdir; the trait was already abstracted, so swapping in an in-memory MemoryFileSystem is a pure additive impl whenever the speed matters.
  • Phase 10 (TemporalScenario) uses the existing services::time_source::TimeSource trait + TestTimeSource. AdvanceClock(d) routes through harness.advance_time(d), which is the same path animations / debounces / auto-save already consult.
  • Phase 8 (TerminalIoScenario) uses the existing vt100 parser the harness already wires through render_real().

Until the still-blocked phases land, the corresponding scenario types ship as honest skeletons: data shape + JSON round-trip work, the runner panics with the precise blocker message naming the production hook still needed.

10.4 Sequencing

Estimated effort per non-trivial phase: 2–4 weeks for one engineer. Total ≈ 6 person-months for the framework + 3 for migrations, parallelizable.

11. Risks

  • ScenarioContext becomes a god object. Mitigation: fields are Option<...>; a buffer-only scenario carries only BufferContext. JSON schema enforces presence iff the runner needs it.
  • Fake LSP / VirtualFs drift from real subsystems. Mitigation: the imperative e2e files for those subsystems stay for one release after each phase ships. Differential testing between fake and real catches drift before retirement.
  • InputEvent enum grows unmaintainable. Mitigation: keep Action separate; only add new variants when a scenario type legitimately needs them. The seven non-Action variants are believed to be the ceiling, not a starting point.
  • Snapshot churn on RenderSnapshot schema changes. Mitigation: snapshot files use serde_json with #[serde(default)] on additive fields; schema changes are reviewed as data-model changes, not as test churn.
  • Corpus-guided proptest finds bugs that aren't in any migrated scenario but block CI. Mitigation: the soak job is non-blocking; found bugs become regression JSON files plus a separate gating test if the underlying bug is to be fixed.

12. Non-goals

  • Theorem-prover export. Considered and rejected. The data form is for proptest + shadow leverage, not Lean. Removing this constraint drops several requirements (formal step semantics, encoded unicode tables, refinement proofs).
  • Replacing the rope buffer with a verified one. Out of scope; the rope is the production subject.
  • GPU/driver-level GUI tests. wgpu rendering quality is wgpu's problem.
  • Terminal-emulator-level tests (xterm vs kitty vs alacritty). We test the editor's output, not its consumers.
  • 100% migration of gui.rs. GUI-specific assertions may remain imperative if the single file doesn't justify a scenario type.

13. Acceptance criteria

The migration is "done" when:

  • tests/e2e/ either contains zero files or only the small set kept as redundant terminal-side proofs and the GUI-specific subset (per §6.1 and §12).
  • tests/semantic/ contains all twelve scenario types with at least one example per type.
  • The corpus dumps to a JSON directory in CI artifacts on every run.
  • At least three shadow models are wired into the corpus-differential CI job.
  • Corpus-guided proptest runs as a soak job; counterexamples produce regression JSON files automatically.
  • theme_screenshots.rs byte-snapshot pipeline is deleted.
  • The split between tests/e2e/ (renders) and tests/semantic/ (doesn't) no longer exists; rendering is tested via §7's layered scenarios within the same framework.
  • CONTRIBUTING.md is updated to describe the scenario-type taxonomy as the primary test idiom; terminal-side e2es are documented as redundant proofs allowed where useful.