cli-anything-plugin/HARNESS.md
This harness provides a standard operating procedure (SOP) and toolkit for coding agents (Claude Code, Codex, etc.) to build powerful, stateful CLI interfaces for open-source GUI applications. The goal: let AI agents operate software that was designed for humans, without needing a display or mouse.
melt, ffmpeg,
convert). These are building blocks.Choose the interaction model:
Define command groups matching the app's logical domains:
Design the state model:
Plan the output format:
--json flagStart with the data layer — XML/JSON manipulation of project files
Add probe/info commands — Let agents inspect before they modify
Add mutation commands — One command per logical operation
Add the backend integration — A utils/<software>_backend.py module that
wraps the real software's CLI. This module handles:
shutil.which())subprocess.run())# utils/lo_backend.py
def convert_odf_to(odf_path, output_format, output_path=None, overwrite=False):
lo = find_libreoffice() # raises RuntimeError with install instructions
subprocess.run([lo, "--headless", "--convert-to", output_format, ...])
return {"output": final_path, "format": output_format, "method": "libreoffice-headless"}
Add rendering/export — The export pipeline calls the backend module. Generate valid intermediate files, then invoke the real software for conversion.
Add session management — State persistence, undo/redo
Session file locking — When saving session JSON, use exclusive file locking
to prevent concurrent writes from corrupting data. Never use bare
open("w") + json.dump() — open("w") truncates the file before any lock
can be acquired. Instead, open with "r+", lock, then truncate inside the lock:
def _locked_save_json(path, data, **dump_kwargs) -> None:
"""Atomically write JSON with exclusive file locking."""
try:
f = open(path, "r+") # no truncation on open
except FileNotFoundError:
os.makedirs(os.path.dirname(os.path.abspath(path)), exist_ok=True)
f = open(path, "w") # first save — file doesn't exist yet
with f:
_locked = False
try:
import fcntl
fcntl.flock(f.fileno(), fcntl.LOCK_EX)
_locked = True
except (ImportError, OSError):
pass # Windows / unsupported FS — proceed unlocked
try:
f.seek(0)
f.truncate() # truncate INSIDE the lock
json.dump(data, f, **dump_kwargs)
f.flush()
finally:
if _locked:
fcntl.flock(f.fileno(), fcntl.LOCK_UN)
Add the REPL with unified skin — Interactive mode wrapping the subcommands.
repl_skin.py from the plugin (cli-anything-plugin/repl_skin.py) into
utils/repl_skin.py in your CLI packageReplSkin for the REPL interface:
from cli_anything.<software>.utils.repl_skin import ReplSkin
skin = ReplSkin("<software>", version="1.0.0")
skin.print_banner() # Branded startup box (auto-detects skills/SKILL.md)
pt_session = skin.create_prompt_session() # prompt_toolkit with history + styling
line = skin.get_input(pt_session, project_name="my_project", modified=True)
skin.help(commands_dict) # Formatted help listing
skin.success("Saved") # ✓ green message
skin.error("Not found") # ✗ red message
skin.warning("Unsaved") # ⚠ yellow message
skin.info("Processing...") # ● blue message
skin.status("Key", "value") # Key-value status line
skin.table(headers, rows) # Formatted table
skin.progress(3, 10, "...") # Progress bar
skin.print_goodbye() # Styled exit message
skills/SKILL.md inside the package directory and displays
it in the banner. AI agents can read the skill file at the displayed absolute path.invoke_without_command=True on the main
Click group, and invoke the repl command when no subcommand is given:
@click.group(invoke_without_command=True)
@click.pass_context
def cli(ctx, ...):
...
if ctx.invoked_subcommand is None:
ctx.invoke(repl, project_path=None)
cli-anything-<software> with no arguments enters the REPLBEFORE writing any test code, create a TEST.md file in the
agent-harness/cli_anything/<software>/tests/ directory. This file serves as your test plan and
MUST contain:
Test Inventory Plan — List planned test files and estimated test counts:
test_core.py: XX unit tests plannedtest_full_e2e.py: XX E2E tests plannedUnit Test Plan — For each core module, describe what will be tested:
project.py)E2E Test Plan — Describe the real-world scenarios to test:
Realistic Workflow Scenarios — Detail each multi-step workflow:
This planning document ensures comprehensive test coverage before writing code.
Now write the actual test code based on the TEST.md plan:
Unit tests (test_core.py) — Every core function tested in isolation with
synthetic data. No external dependencies.
E2E tests — intermediate files (test_full_e2e.py) — Verify the project files
your CLI generates are structurally correct (valid XML, correct ZIP structure, etc.)
E2E tests — true backend (test_full_e2e.py) — MUST invoke the real software.
Create a project, export via the actual software backend, and verify the output:
%PDF-, DOCX/XLSX/PPTX is valid ZIP/OOXML, etc.)print(f"\n PDF: {path} ({size:,} bytes)")Output verification — Don't trust that export works just because it exits successfully. Verify outputs programmatically:
CLI subprocess tests — Test the installed CLI command as a real user/agent would.
The subprocess tests MUST also produce real final output (not just ODF intermediate).
Use the _resolve_cli helper to run the installed cli-anything-<software> command:
def _resolve_cli(name):
"""Resolve installed CLI command; falls back to python -m for dev.
Set env CLI_ANYTHING_FORCE_INSTALLED=1 to require the installed command.
"""
import shutil
force = os.environ.get("CLI_ANYTHING_FORCE_INSTALLED", "").strip() == "1"
path = shutil.which(name)
if path:
print(f"[_resolve_cli] Using installed command: {path}")
return [path]
if force:
raise RuntimeError(f"{name} not found in PATH. Install with: pip install -e .")
module = name.replace("cli-anything-", "cli_anything.") + "." + name.split("-")[-1] + "_cli"
print(f"[_resolve_cli] Falling back to: {sys.executable} -m {module}")
return [sys.executable, "-m", module]
class TestCLISubprocess:
CLI_BASE = _resolve_cli("cli-anything-<software>")
def _run(self, args, check=True):
return subprocess.run(
self.CLI_BASE + args,
capture_output=True, text=True,
check=check,
)
def test_help(self):
result = self._run(["--help"])
assert result.returncode == 0
def test_project_new_json(self, tmp_dir):
out = os.path.join(tmp_dir, "test.json")
result = self._run(["--json", "project", "new", "-o", out])
assert result.returncode == 0
data = json.loads(result.stdout)
# ... verify structure
Key rules for subprocess tests:
_resolve_cli("cli-anything-<software>") — never hardcode
sys.executable or module paths directlycwd — installed commands must work from any directoryCLI_ANYTHING_FORCE_INSTALLED=1 in CI/release testing to ensure the
installed command (not a fallback) is being tested--help, --json, project creation, key commands, and full workflowsRound-trip test — Create project via CLI, open in GUI, verify correctness
Agent test — Have an AI agent complete a real task using only the CLI
After running all tests successfully, append to the existing TEST.md:
pytest -v --tb=no output showing all tests
passing with their names and statusThe TEST.md now serves as both the test plan (written before implementation) and the test results documentation (appended after execution), providing a complete record of the testing process.
Generate a SKILL.md file that makes the CLI discoverable and usable by AI agents through the skill-creator methodology. This file serves as a self-contained skill definition that can be loaded by Claude Code or other AI assistants.
Purpose: SKILL.md files follow a standard format that enables AI agents to:
SKILL.md Structure:
YAML Frontmatter — Triggering metadata for skill discovery:
---
name: "cli-anything-<software>"
description: "Brief description of what the CLI does"
---
Markdown Body — Usage instructions including:
Generation Process:
Extract CLI metadata using skill_generator.py:
from skill_generator import generate_skill_file
skill_path = generate_skill_file(
harness_path="/path/to/agent-harness"
)
# Default output: cli_anything/<software>/skills/SKILL.md
The generator automatically extracts:
Customize the template (optional):
templates/SKILL.md.templateOutput Location:
SKILL.md is generated inside the Python package so it is installed with pip install:
<software>/
└── agent-harness/
└── cli_anything/
└── <software>/
└── skills/
└── SKILL.md
Manual Generation:
cd cli-anything-plugin
python skill_generator.py /path/to/software/agent-harness
Integration with CLI Build:
The SKILL.md generation should be run after Phase 6 (Test Documentation) completes successfully, ensuring the CLI is fully documented and tested before creating the skill definition.
Key Principles:
--json flag usage for machine-readable outputSkill Path in CLI Banner:
ReplSkin auto-detects skills/SKILL.md inside the package and displays the absolute
path in the startup banner. AI agents can read the file at the displayed path:
# In the REPL initialization (e.g., shotcut_cli.py)
from cli_anything.<software>.utils.repl_skin import ReplSkin
skin = ReplSkin("<software>", version="1.0.0")
skin.print_banner() # Auto-detects and displays: ◇ Skill: /path/to/cli_anything/<software>/skills/SKILL.md
Package Data:
Ensure setup.py includes the skill file as package data so it is installed with pip:
package_data={
"cli_anything.<software>": ["skills/*.md"],
},
This is the #1 rule. The CLI MUST call the actual software for rendering and export — not reimplement the software's functionality in Python.
The anti-pattern: Building a Pillow-based image compositor to replace GIMP, or generating bpy scripts without ever calling Blender. This produces a toy that can't handle real workloads and diverges from the actual software's behavior.
The correct approach:
Use the software's CLI/scripting interface as the backend:
libreoffice --headless --convert-to pdf/docx/xlsx/pptxblender --background --python script.pygimp -i -b '(script-fu-console-eval ...)'inkscape --actions="..." --export-filename=...melt project.mlt -consumer avformat:output.mp4sox for effects processingobs-websocket protocolThe software is a required dependency, not optional. Add it to installation instructions. The CLI is useless without the actual software.
Generate valid project/intermediate files (ODF, MLT XML, .blend, SVG, etc.) then hand them to the real software for rendering. Your CLI is a structured command-line interface to the software, not a replacement for it.
Example — LibreOffice CLI export pipeline:
# 1. Build the document as a valid ODF file (our XML builder)
odf_path = write_odf(tmp_path, doc_type, project)
# 2. Convert via the REAL LibreOffice (not a reimplementation)
subprocess.run([
"libreoffice", "--headless",
"--convert-to", "pdf",
"--outdir", output_dir,
odf_path,
])
# Result: a real PDF rendered by LibreOffice's full engine
This is the #2 pitfall. Most GUI apps apply effects at render time via their engine. When you build a CLI that manipulates project files directly, you must also handle rendering — and naive approaches will silently drop effects.
The problem: Your CLI adds filters/effects to the project file format. But when rendering, if you use a simple tool (e.g., ffmpeg concat demuxer), it reads raw media files and ignores all project-level effects. The output looks identical to the input. Users can't tell anything happened.
The solution — a filter translation layer:
melt for MLT projects). It reads
the project file and applies everything.-filter_complex).Priority order for rendering: native engine → translated filtergraph → script.
When translating effects between formats (e.g., MLT → ffmpeg), watch for:
brightness and saturation filters, and
both map to ffmpeg's eq=, you must merge them into a single eq=brightness=X:saturation=Y.concat filter requires interleaved stream
ordering: [v0][a0][v1][a1][v2][a2], NOT grouped [v0][v1][v2][a0][a1][a2].
The error message ("media type mismatch") is cryptic if you don't know this.1.15 = +15%, but ffmpeg eq=brightness=0.06 on a -1..1 scale.
Document every mapping explicitly.Non-integer frame rates (29.97fps = 30000/1001) cause cumulative rounding errors:
round(), not int() for float-to-frame conversion. int(9000 * 29.97)
truncates and loses frames; round() gets the right answer.round(frames * fps_den * 1000 / fps_num), then decompose with integer
division. Avoid intermediate floats that drift over long durations.Never assume an export is correct just because it ran without errors. Verify:
# Video: probe specific frames with ffmpeg
# Frame 0 for fade-in (should be near-black)
# Middle frames for color effects (compare brightness/saturation vs source)
# Last frame for fade-out (should be near-black)
# When comparing pixel values between different resolutions,
# exclude letterboxing/pillarboxing (black padding bars).
# A vertical video in a horizontal frame will have ~40% black pixels.
# Audio: check RMS levels at start/end for fades
# Compare spectral characteristics against source
Four test layers with complementary purposes:
test_core.py): Synthetic data, no external dependencies. Tests
every function in isolation. Fast, deterministic, good for CI.test_full_e2e.py): Tests the project file generation
pipeline (ODF structure, XML content, format validation). Verifies the
intermediate files your CLI produces are correct.test_full_e2e.py): Invokes the real software
(LibreOffice, Blender, melt, etc.) to produce final output files (PDF, DOCX,
rendered images, videos). Verifies the output files:
test_full_e2e.py): Invokes the installed
cli-anything-<software> command via subprocess.run to run the full workflow
end-to-end: create project → add content → export via real software → verify output.No graceful degradation. The real software MUST be installed. Tests must NOT skip or fake results when the software is missing — the CLI is useless without it. The software is a hard dependency, not optional.
Example — true E2E test for LibreOffice:
class TestWriterToPDF:
def test_rich_writer_to_pdf(self, tmp_dir):
proj = create_document(doc_type="writer", name="Report")
add_heading(proj, text="Quarterly Report", level=1)
add_table(proj, rows=3, cols=3, data=[...])
pdf_path = os.path.join(tmp_dir, "report.pdf")
result = export(proj, pdf_path, preset="pdf", overwrite=True)
# Verify the REAL output file
assert os.path.exists(result["output"])
assert result["file_size"] > 1000 # Not suspiciously small
with open(result["output"], "rb") as f:
assert f.read(5) == b"%PDF-" # Validate format magic bytes
print(f"\n PDF: {result['output']} ({result['file_size']:,} bytes)")
class TestCLISubprocessE2E:
CLI_BASE = _resolve_cli("cli-anything-libreoffice")
def test_full_writer_pdf_workflow(self, tmp_dir):
proj_path = os.path.join(tmp_dir, "test.json")
pdf_path = os.path.join(tmp_dir, "output.pdf")
self._run(["document", "new", "-o", proj_path, "--type", "writer"])
self._run(["--project", proj_path, "writer", "add-heading", "-t", "Title"])
self._run(["--project", proj_path, "export", "render", pdf_path, "-p", "pdf", "--overwrite"])
assert os.path.exists(pdf_path)
with open(pdf_path, "rb") as f:
assert f.read(5) == b"%PDF-"
Run tests in force-installed mode to guarantee the real command is used:
CLI_ANYTHING_FORCE_INSTALLED=1 python3 -m pytest cli_anything/<software>/tests/ -v -s
The -s flag shows the [_resolve_cli] print output confirming which backend
is being used and prints artifact paths for manual inspection.
Real-world workflow test scenarios should include:
cli-anything-libreoffice must error clearly, not
silently produce inferior output with a fallback library.libreoffice --headless, blender --background,
melt, ffmpeg, inkscape --actions, sox as subprocesses for rendering.info, list, status commands are critical for agents
to understand current state before acting.--json for machine parsing.cli_anything/<software>/ directory MUST contain a README.md that explains how to
install the software dependency, install the CLI, run tests, and shows basic usage.cli-anything-<software> command via _resolve_cli(). Tests must work against
the actual installed package, not just source imports.cli_anything/<software>/tests/ directory MUST contain a TEST.md documenting what the tests
cover, what realistic workflows are tested, and the full test results output.repl_skin.py) for the interactive mode.
Copy cli-anything-plugin/repl_skin.py to utils/repl_skin.py and use ReplSkin
for the banner, prompt, help, messages, and goodbye. REPL MUST be the default behavior
when the CLI is invoked without a subcommand (invoke_without_command=True).<software>/
└── agent-harness/
├── <SOFTWARE>.md # Project-specific analysis and SOP
├── setup.py # PyPI package configuration (Phase 7)
├── cli_anything/ # Namespace package (NO __init__.py here)
│ └── <software>/ # Sub-package for this CLI
│ ├── __init__.py
│ ├── __main__.py # python3 -m cli_anything.<software>
│ ├── README.md # HOW TO RUN — required
│ ├── <software>_cli.py # Main CLI entry point (Click + REPL)
│ ├── core/ # Core modules (one per domain)
│ │ ├── __init__.py
│ │ ├── project.py # Project create/open/save/info
│ │ ├── ... # Domain-specific modules
│ │ ├── export.py # Render pipeline + filter translation
│ │ └── session.py # Stateful session, undo/redo
│ ├── utils/ # Shared utilities
│ │ ├── __init__.py
│ │ ├── <software>_backend.py # Backend: invokes the real software
│ │ └── repl_skin.py # Unified REPL skin (copy from plugin)
│ └── tests/ # Test suites
│ ├── TEST.md # Test documentation and results — required
│ ├── test_core.py # Unit tests (synthetic data)
│ └── test_full_e2e.py # E2E tests (real files)
└── examples/ # Example scripts and workflows
Critical: The cli_anything/ directory must NOT contain an __init__.py.
This is what makes it a PEP 420 namespace package — multiple separately-installed
PyPI packages can each contribute a sub-package under cli_anything/ without
conflicting. For example, cli-anything-gimp adds cli_anything/gimp/ and
cli-anything-blender adds cli_anything/blender/, and both coexist in the
same Python environment.
Note: This HARNESS.md is part of the cli-anything-plugin. Individual software directories reference this file — do NOT duplicate it.
This same SOP applies to any GUI application:
| Software | Backend CLI | Native Format | System Package | How the CLI Uses It |
|---|---|---|---|---|
| LibreOffice | libreoffice --headless | .odt/.ods/.odp (ODF ZIP) | apt install libreoffice | Generate ODF → convert to PDF/DOCX/XLSX/PPTX |
| Blender | blender --background --python | .blend-cli.json | apt install blender | Generate bpy script → Blender renders to PNG/MP4 |
| GIMP | gimp -i -b '(script-fu ...)' | .xcf | apt install gimp | Script-Fu commands → GIMP processes & exports |
| Inkscape | inkscape --actions="..." | .svg (XML) | apt install inkscape | Manipulate SVG → Inkscape exports to PNG/PDF |
| Shotcut/Kdenlive | melt or ffmpeg | .mlt (XML) | apt install melt ffmpeg | Build MLT XML → melt/ffmpeg renders video |
| Audacity | sox | .aup3 | apt install sox | Generate sox commands → sox processes audio |
| OBS Studio | obs-websocket | scene.json | apt install obs-studio | WebSocket API → OBS captures/records |
The software is a required dependency, not optional. The CLI generates valid intermediate files (ODF, MLT XML, bpy scripts, SVG) and hands them to the real software for rendering. This is what makes the CLI actually useful — it's a command-line interface TO the software, not a replacement for it.
The pattern is always the same: build the data → call the real software → verify the output.
After building and testing the CLI, make it installable and discoverable.
All cli-anything CLIs use PEP 420 namespace packages under the shared
cli_anything namespace. This allows multiple CLI packages to be installed
side-by-side in the same Python environment without conflicts.
Structure the package as a namespace package:
agent-harness/
├── setup.py
└── cli_anything/ # NO __init__.py here (namespace package)
└── <software>/ # e.g., gimp, blender, audacity
├── __init__.py # HAS __init__.py (regular sub-package)
├── <software>_cli.py
├── core/
├── utils/
└── tests/
The key rule: cli_anything/ has no __init__.py. Each sub-package
(gimp/, blender/, etc.) does have __init__.py. This is what
enables multiple packages to contribute to the same namespace.
Create setup.py in the agent-harness/ directory:
from setuptools import setup, find_namespace_packages
setup(
name="cli-anything-<software>",
version="1.0.0",
packages=find_namespace_packages(include=["cli_anything.*"]),
install_requires=[
"click>=8.0.0",
"prompt-toolkit>=3.0.0",
# Add Python library dependencies here
],
entry_points={
"console_scripts": [
"cli-anything-<software>=cli_anything.<software>.<software>_cli:main",
],
},
python_requires=">=3.10",
)
Important details:
find_namespace_packages, NOT find_packagesinclude=["cli_anything.*"] to scope discoverycli_anything.<software>.<software>_cli:maininstall_requires. Document it in README.md and
have the backend module raise a clear error with install instructions:
# In utils/<software>_backend.py
def find_<software>():
path = shutil.which("<software>")
if path:
return path
raise RuntimeError(
"<Software> is not installed. Install it with:\n"
" apt install <software> # Debian/Ubuntu\n"
" brew install <software> # macOS"
)
All imports use the cli_anything.<software> prefix:
from cli_anything.gimp.core.project import create_project
from cli_anything.gimp.core.session import Session
from cli_anything.blender.core.scene import create_scene
Test local installation:
cd /root/cli-anything/<software>/agent-harness
pip install -e .
Verify PATH installation:
which cli-anything-<software>
cli-anything-<software> --help
Run tests against the installed command:
cd /root/cli-anything/<software>/agent-harness
CLI_ANYTHING_FORCE_INSTALLED=1 python3 -m pytest cli_anything/<software>/tests/ -v -s
The output must show [_resolve_cli] Using installed command: /path/to/cli-anything-<software>
confirming subprocess tests ran against the real installed binary, not a module fallback.
Verify namespace works across packages (when multiple CLIs installed):
import cli_anything.gimp
import cli_anything.blender
# Both resolve to their respective source directories
Why namespace packages:
cli_anything namespacecli_anything.*