skills/videodb/reference/generative.md
VideoDB provides AI-powered generation of images, videos, music, sound effects, voice, and text content. All generation methods are on the Collection object.
You need a connection and a collection reference before calling any generation method:
import videodb
conn = videodb.connect()
coll = conn.get_collection()
Generate images from text prompts:
image = coll.generate_image(
prompt="a futuristic cityscape at sunset with flying cars",
aspect_ratio="16:9",
)
# Access the generated image
print(image.id)
print(image.generate_url()) # returns a signed download URL
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | required | Text description of the image to generate |
aspect_ratio | str | "1:1" | Aspect ratio: "1:1", "9:16", "16:9", "4:3", or "3:4" |
callback_url | str|None | None | URL to receive async callback |
Returns an Image object with .id, .name, and .collection_id. The .url property may be None for generated images — always use image.generate_url() to get a reliable signed download URL.
Note: Unlike
Videoobjects (which use.generate_stream()),Imageobjects use.generate_url()to retrieve the image URL. The.urlproperty is only populated for some image types (e.g. thumbnails).
Generate short video clips from text prompts:
video = coll.generate_video(
prompt="a timelapse of a flower blooming in a garden",
duration=5,
)
stream_url = video.generate_stream()
video.play()
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | required | Text description of the video to generate |
duration | int | 5 | Duration in seconds (must be integer value, 5-8) |
callback_url | str|None | None | URL to receive async callback |
Returns a Video object. Generated videos are automatically added to the collection and can be used in timelines, searches, and compilations like any uploaded video.
VideoDB provides three separate methods for different audio types.
Generate background music from text descriptions:
music = coll.generate_music(
prompt="upbeat electronic music with a driving beat, suitable for a tech demo",
duration=30,
)
print(music.id)
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | required | Text description of the music |
duration | int | 5 | Duration in seconds |
callback_url | str|None | None | URL to receive async callback |
Generate specific sound effects:
sfx = coll.generate_sound_effect(
prompt="thunderstorm with heavy rain and distant thunder",
duration=10,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | required | Text description of the sound effect |
duration | int | 2 | Duration in seconds |
config | dict | {} | Additional configuration |
callback_url | str|None | None | URL to receive async callback |
Generate speech from text:
voice = coll.generate_voice(
text="Welcome to our product demo. Today we'll walk through the key features.",
voice_name="Default",
)
| Parameter | Type | Default | Description |
|---|---|---|---|
text | str | required | Text to convert to speech |
voice_name | str | "Default" | Voice to use |
config | dict | {} | Additional configuration |
callback_url | str|None | None | URL to receive async callback |
All three audio methods return an Audio object with .id, .name, .length, and .collection_id.
Use coll.generate_text() to run LLM analysis. This is a Collection-level method -- pass any context (transcripts, descriptions) directly in the prompt string.
# Get transcript from a video first
transcript_text = video.get_transcript_text()
# Generate analysis using collection LLM
result = coll.generate_text(
prompt=f"Summarize the key points discussed in this video:\n{transcript_text}",
model_name="pro",
)
print(result["output"])
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | str | required | Prompt with context for the LLM |
model_name | str | "basic" | Model tier: "basic", "pro", or "ultra" |
response_type | str | "text" | Response format: "text" or "json" |
Returns a dict with an output key. When response_type="text", output is a str. When response_type="json", output is a dict.
result = coll.generate_text(prompt="Summarize this", model_name="pro")
print(result["output"]) # access the actual text/dict
Combine scene extraction with text generation:
from videodb import SceneExtractionType
# First index scenes
scenes = video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={"time": 10},
prompt="Describe the visual content in this scene.",
)
# Get transcript for spoken context
transcript_text = video.get_transcript_text()
scene_descriptions = []
for scene in scenes:
if isinstance(scene, dict):
description = scene.get("description") or scene.get("summary")
else:
description = getattr(scene, "description", None) or getattr(scene, "summary", None)
scene_descriptions.append(description or str(scene))
scenes_text = "\n".join(scene_descriptions)
# Analyze with collection LLM
result = coll.generate_text(
prompt=(
f"Given this video transcript:\n{transcript_text}\n\n"
f"And these visual scene descriptions:\n{scenes_text}\n\n"
"Based on the spoken and visual content, describe the main topics covered."
),
model_name="pro",
)
print(result["output"])
Dub a video into another language using the collection method:
dubbed_video = coll.dub_video(
video_id=video.id,
language_code="es", # Spanish
)
dubbed_video.play()
| Parameter | Type | Default | Description |
|---|---|---|---|
video_id | str | required | ID of the video to dub |
language_code | str | required | Target language code (e.g., "es", "fr", "de") |
callback_url | str|None | None | URL to receive async callback |
Returns a Video object with the dubbed content.
Translate a video's transcript without dubbing:
translated = video.translate_transcript(
language="Spanish",
additional_notes="Use formal tone",
)
for entry in translated:
print(entry)
Supported languages include: en, es, fr, de, it, pt, ja, ko, zh, hi, ar, and more.
import videodb
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# Get transcript
transcript_text = video.get_transcript_text()
# Generate narration script using collection LLM
result = coll.generate_text(
prompt=(
f"Write a professional narration script for this video content:\n"
f"{transcript_text[:2000]}"
),
model_name="pro",
)
script = result["output"]
# Convert script to speech
narration = coll.generate_voice(text=script)
print(f"Narration audio: {narration.id}")
thumbnail = coll.generate_image(
prompt="professional video thumbnail showing data analytics dashboard, modern design",
aspect_ratio="16:9",
)
print(f"Thumbnail URL: {thumbnail.generate_url()}")
import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset
conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")
# Generate background music
music = coll.generate_music(
prompt="calm ambient background music for a tutorial video",
duration=60,
)
# Build timeline with video + music overlay
timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id))
timeline.add_overlay(0, AudioAsset(asset_id=music.id, disable_other_tracks=False))
stream_url = timeline.generate_stream()
print(f"Video with music: {stream_url}")
transcript_text = video.get_transcript_text()
result = coll.generate_text(
prompt=(
f"Given this transcript:\n{transcript_text}\n\n"
"Return a JSON object with keys: summary, topics (array), action_items (array)."
),
model_name="pro",
response_type="json",
)
# result["output"] is a dict when response_type="json"
print(result["output"]["summary"])
print(result["output"]["topics"])
generate_music() for background music, generate_sound_effect() for SFX, and generate_voice() for text-to-speech. There is no unified generate_audio() method.coll.generate_text() does not have access to video content automatically. Fetch the transcript with video.get_transcript_text() and pass it in the prompt."basic" is fastest, "pro" is balanced, "ultra" is highest quality. Use "pro" for most analysis tasks."1:1", "9:16", "16:9", "4:3", or "3:4".