shotcut/agent-harness/HARNESS.md
This harness provides a standard operating procedure (SOP) and toolkit for coding agents (Claude Code, Codex, etc.) to build powerful, stateful CLI interfaces for open-source GUI applications. The goal: let AI agents operate software that was designed for humans, without needing a display or mouse.
melt, ffmpeg,
convert). These are building blocks.Choose the interaction model:
Define command groups matching the app's logical domains:
Design the state model:
Plan the output format:
--json flagThis is the #1 pitfall. Most GUI apps apply effects at render time via their engine. When you build a CLI that manipulates project files directly, you must also handle rendering — and naive approaches will silently drop effects.
The problem: Your CLI adds filters/effects to the project file format. But when rendering, if you use a simple tool (e.g., ffmpeg concat demuxer), it reads raw media files and ignores all project-level effects. The output looks identical to the input. Users can't tell anything happened.
The solution — a filter translation layer:
melt for MLT projects). It reads
the project file and applies everything.-filter_complex).Priority order for rendering: native engine → translated filtergraph → script.
When translating effects between formats (e.g., MLT → ffmpeg), watch for:
brightness and saturation filters, and
both map to ffmpeg's eq=, you must merge them into a single eq=brightness=X:saturation=Y.concat filter requires interleaved stream
ordering: [v0][a0][v1][a1][v2][a2], NOT grouped [v0][v1][v2][a0][a1][a2].
The error message ("media type mismatch") is cryptic if you don't know this.1.15 = +15%, but ffmpeg eq=brightness=0.06 on a -1..1 scale.
Document every mapping explicitly.Non-integer frame rates (29.97fps = 30000/1001) cause cumulative rounding errors:
round(), not int() for float-to-frame conversion. int(9000 * 29.97)
truncates and loses frames; round() gets the right answer.round(frames * fps_den * 1000 / fps_num), then decompose with integer
division. Avoid intermediate floats that drift over long durations.Never assume an export is correct just because it ran without errors. Verify:
# Video: probe specific frames with ffmpeg
# Frame 0 for fade-in (should be near-black)
# Middle frames for color effects (compare brightness/saturation vs source)
# Last frame for fade-out (should be near-black)
# When comparing pixel values between different resolutions,
# exclude letterboxing/pillarboxing (black padding bars).
# A vertical video in a horizontal frame will have ~40% black pixels.
# Audio: check RMS levels at start/end for fades
# Compare spectral characteristics against source
Two test suites with complementary purposes:
test_core.py): Synthetic data, no external dependencies. Tests
every function in isolation. Fast, deterministic, good for CI.test_full_e2e.py): Real media files. Tests the full pipeline
including format parsing, codec handling, and actual rendering. Catches the
real-world issues that unit tests can't.Real-world workflow test scenarios should include:
melt, ffmpeg, ffprobe as subprocesses
when available. Don't reinvent rendering.info, list, status commands are critical for agents
to understand current state before acting.--json for machine parsing.cli/ directory MUST contain a README.md that explains how to
install dependencies, run the CLI, run tests, and shows basic usage examples.
This is the first thing a user or agent reads. Without it, the CLI is unusable.agent-harness/
├── HARNESS.md # This file — general SOP
├── SHOTCUT.md # Project-specific analysis and SOP
├── cli/ # The actual CLI implementation
│ ├── README.md # HOW TO RUN — required
│ ├── __init__.py
│ ├── __main__.py # python3 -m cli.shotcut_cli
│ ├── shotcut_cli.py # Main CLI entry point (Click + REPL)
│ ├── core/ # Core modules (one per domain)
│ │ ├── __init__.py
│ │ ├── project.py # Project create/open/save/info
│ │ ├── timeline.py # Tracks, clips, trim, split, move
│ │ ├── filters.py # Filter registry + add/remove/set
│ │ ├── media.py # ffprobe wrapper, media inventory
│ │ ├── export.py # Render pipeline + filter translation
│ │ └── session.py # Stateful session, undo/redo
│ ├── utils/ # Shared utilities
│ │ ├── __init__.py
│ │ ├── mlt_xml.py # MLT XML parsing/generation (lxml)
│ │ └── time.py # Timecode ↔ frame conversion
│ └── tests/ # Test suites
│ ├── test_core.py # Unit tests (65 tests, synthetic)
│ └── test_full_e2e.py # E2E tests (79 tests, real media)
├── examples/ # Example scripts and workflows
│ └── workflow_basic.sh
└── workflow_demo.py # Full demo: 3-segment highlight reel
This same SOP applies to any GUI application:
| Software | Backend | Native Format | Existing CLI | Rendering Gap Risk |
|---|---|---|---|---|
| Shotcut | MLT | .mlt (XML) | melt, ffmpeg | High — must translate filters |
| GIMP | GEGL | .xcf | gimp -i (script-fu) | Medium — GEGL has CLI |
| Blender | bpy | .blend | blender --python | Low — bpy renders natively |
| Inkscape | librsvg | .svg (XML) | inkscape --actions | Low — SVG is the format |
| Audacity | PortAudio | .aup3 (SQLite) | — | High — no CLI renderer |
| LibreOffice | UNO | .odt (XML+ZIP) | soffice --macro | Low — UNO API works |
| OBS Studio | libobs | scene.json | obs-websocket | Medium — live only |
| Kdenlive | MLT | .kdenlive (XML) | melt | High — same as Shotcut |
The "Rendering Gap Risk" column indicates how likely it is that a naive export approach will silently drop effects. High risk means you almost certainly need a filter translation layer.
The pattern is always the same: find the data format, find the engine, build a CLI that manipulates one and drives the other — and verify the output.