website/docs/developer-guide/video-gen-provider-plugin.md
Video-gen provider plugins register a backend that services every video_generate tool call. Built-in providers (xAI, FAL) ship as plugins. Add a new one, or override a bundled one, by dropping a directory into plugins/video_gen/<name>/.
:::tip
Video-gen mirrors Image Generation Provider Plugins almost line-for-line — if you've built an image-gen backend, you already know the shape. The main differences: a capabilities() method advertising modalities/aspect-ratios/durations, and a routing convention (pass image_url to use image-to-video, omit it to use text-to-video — the provider picks the right endpoint internally).
:::
The video_generate tool exposes two modalities through one parameter:
prompt only. The provider routes to its text-to-video endpoint.prompt + image_url. The provider routes to its image-to-video endpoint.Edit and extend are intentionally out of scope. Most backends don't support them and the inconsistency would force per-backend prose into the agent's tool description.
Hermes scans for video-gen backends in three places:
<repo>/plugins/video_gen/<name>/ (auto-loaded with kind: backend)~/.hermes/plugins/video_gen/<name>/ (opt-in via plugins.enabled)hermes_agent.plugins entry pointEach plugin's register(ctx) function calls ctx.register_video_gen_provider(...). The active provider is picked by video_gen.provider in config.yaml; hermes tools → Video Generation walks users through selection. Unlike image_generate, there is no in-tree legacy backend — every provider is a plugin.
plugins/video_gen/my-backend/
├── __init__.py # VideoGenProvider subclass + register()
└── plugin.yaml # Manifest with kind: backend
Subclass agent.video_gen_provider.VideoGenProvider. Required: name property and generate() method.
# plugins/video_gen/my-backend/__init__.py
from typing import Any, Dict, List, Optional
import os
from agent.video_gen_provider import (
VideoGenProvider,
error_response,
success_response,
)
class MyVideoGenProvider(VideoGenProvider):
@property
def name(self) -> str:
return "my-backend"
@property
def display_name(self) -> str:
return "My Backend"
def is_available(self) -> bool:
return bool(os.environ.get("MY_API_KEY"))
def list_models(self) -> List[Dict[str, Any]]:
# Each entry is a model FAMILY — a name the user picks once.
# Your provider's generate() routes within the family based on
# whether image_url was passed.
return [
{
"id": "fast",
"display": "Fast",
"speed": "~30s",
"strengths": "Cheapest tier",
"price": "$0.05/s",
"modalities": ["text", "image"], # advisory
},
]
def default_model(self) -> Optional[str]:
return "fast"
def capabilities(self) -> Dict[str, Any]:
return {
"modalities": ["text", "image"],
"aspect_ratios": ["16:9", "9:16"],
"resolutions": ["720p", "1080p"],
"min_duration": 1,
"max_duration": 10,
"supports_audio": False,
"supports_negative_prompt": True,
"max_reference_images": 0,
}
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "My Backend",
"badge": "paid",
"tag": "Short description shown in `hermes tools`",
"env_vars": [
{
"key": "MY_API_KEY",
"prompt": "My Backend API key",
"url": "https://mybackend.example.com/keys",
},
],
}
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = "16:9",
resolution: str = "720p",
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any, # always ignore unknown kwargs for forward-compat
) -> Dict[str, Any]:
# ROUTE: image_url presence picks the endpoint.
if image_url:
endpoint = "my-backend/image-to-video"
modality_used = "image"
else:
endpoint = "my-backend/text-to-video"
modality_used = "text"
# ... call your API ...
return success_response(
video="https://your-cdn/output.mp4",
model=model or "fast",
prompt=prompt,
modality=modality_used,
aspect_ratio=aspect_ratio,
duration=duration or 5,
provider=self.name,
)
def register(ctx) -> None:
ctx.register_video_gen_provider(MyVideoGenProvider())
# plugins/video_gen/my-backend/plugin.yaml
name: my-backend
version: 1.0.0
description: "My video generation backend"
author: Your Name
kind: backend
requires_env:
- MY_API_KEY
video_generate schemaThe tool exposes one schema across every backend. Providers ignore parameters they don't support.
| Parameter | What it does |
|---|---|
prompt | Text instruction (required) |
image_url | When set → image-to-video; when omitted → text-to-video |
reference_image_urls | Style/character refs (provider-dependent) |
duration | Seconds — provider clamps |
aspect_ratio | "16:9", "9:16", "1:1", ... — provider clamps |
resolution | "480p" / "540p" / "720p" / "1080p" — provider clamps |
negative_prompt | Content to avoid (Pixverse/Kling only) |
audio | Native audio (Veo3 / Pixverse pricing tier) |
seed | Reproducibility |
model | Override the active model/family |
The provider's capabilities() advertises which of these are honored. The agent sees the active backend's capabilities in the tool description, dynamically rebuilt when the user changes backend via hermes tools.
When your backend has multiple endpoints per "model" — like FAL, where every family (Veo 3.1, Pixverse v6, Kling O3) has both a /text-to-video and an /image-to-video URL — represent each family as one catalog entry. Your generate() picks the right endpoint based on whether image_url was passed:
FAMILIES = {
"veo3.1": {
"text_endpoint": "fal-ai/veo3.1",
"image_endpoint": "fal-ai/veo3.1/image-to-video",
# ... family-specific capability flags ...
},
}
def generate(self, prompt, *, image_url=None, model=None, **kwargs):
family_id, family = _resolve_family(model)
endpoint = family["image_endpoint"] if image_url else family["text_endpoint"]
# ... build payload from family's declared capability flags, call endpoint ...
The user picks veo3.1 once in hermes tools. The agent never thinks about endpoints — it just passes (or doesn't pass) image_url.
For per-instance model knobs (see plugins/video_gen/fal/__init__.py):
model= keyword from the tool call<PROVIDER>_VIDEO_MODEL env varvideo_gen.<provider>.model in config.yamlvideo_gen.model in config.yaml (when it's one of your IDs)default_model()success_response() and error_response() produce the dict shape every backend returns. Use them — don't hand-roll the dict.
Success keys: success, video (URL or absolute path), model, prompt, modality ("text" or "image"), aspect_ratio, duration, provider, plus extra.
Error keys: success, video (None), error, error_type, model, prompt, aspect_ratio, provider.
If your backend returns base64, use save_b64_video() to write under $HERMES_HOME/cache/videos/. For raw bytes from a follow-up HTTP fetch, use save_bytes_video(). Otherwise return the upstream URL directly — the gateway resolves remote URLs on delivery.
Drop a smoke test under tests/plugins/video_gen/test_<name>_plugin.py. The xAI and FAL tests show the pattern — register, verify catalog, exercise routing both with and without image_url, assert clean error responses on missing auth.