Generative Media Guide

VideoDB provides AI-powered generation of images, videos, music, sound effects, voice, and text content. All generation methods are on the Collection object.

Prerequisites

You need a connection and a collection reference before calling any generation method:

python

import videodb

conn = videodb.connect()
coll = conn.get_collection()

Image Generation

Generate images from text prompts:

python

image = coll.generate_image(
    prompt="a futuristic cityscape at sunset with flying cars",
    aspect_ratio="16:9",
)

# Access the generated image
print(image.id)
print(image.generate_url())  # returns a signed download URL

generate_image Parameters

Parameter	Type	Default	Description
`prompt`	`str`	required	Text description of the image to generate
`aspect_ratio`	`str`	`"1:1"`	Aspect ratio: `"1:1"`, `"9:16"`, `"16:9"`, `"4:3"`, or `"3:4"`
`callback_url`	`str\|None`	`None`	URL to receive async callback

Returns an Image object with .id, .name, and .collection_id. The .url property may be None for generated images — always use image.generate_url() to get a reliable signed download URL.

Note: Unlike Video objects (which use .generate_stream()), Image objects use .generate_url() to retrieve the image URL. The .url property is only populated for some image types (e.g. thumbnails).

Video Generation

Generate short video clips from text prompts:

python

video = coll.generate_video(
    prompt="a timelapse of a flower blooming in a garden",
    duration=5,
)

stream_url = video.generate_stream()
video.play()

generate_video Parameters

Parameter	Type	Default	Description
`prompt`	`str`	required	Text description of the video to generate
`duration`	`int`	`5`	Duration in seconds (must be integer value, 5-8)
`callback_url`	`str\|None`	`None`	URL to receive async callback

Returns a Video object. Generated videos are automatically added to the collection and can be used in timelines, searches, and compilations like any uploaded video.

Audio Generation

VideoDB provides three separate methods for different audio types.

Music

Generate background music from text descriptions:

python

music = coll.generate_music(
    prompt="upbeat electronic music with a driving beat, suitable for a tech demo",
    duration=30,
)

print(music.id)

Parameter	Type	Default	Description
`prompt`	`str`	required	Text description of the music
`duration`	`int`	`5`	Duration in seconds
`callback_url`	`str\|None`	`None`	URL to receive async callback

Sound Effects

Generate specific sound effects:

python

sfx = coll.generate_sound_effect(
    prompt="thunderstorm with heavy rain and distant thunder",
    duration=10,
)

Parameter	Type	Default	Description
`prompt`	`str`	required	Text description of the sound effect
`duration`	`int`	`2`	Duration in seconds
`config`	`dict`	`{}`	Additional configuration
`callback_url`	`str\|None`	`None`	URL to receive async callback

Voice (Text-to-Speech)

Generate speech from text:

python

voice = coll.generate_voice(
    text="Welcome to our product demo. Today we'll walk through the key features.",
    voice_name="Default",
)

Parameter	Type	Default	Description
`text`	`str`	required	Text to convert to speech
`voice_name`	`str`	`"Default"`	Voice to use
`config`	`dict`	`{}`	Additional configuration
`callback_url`	`str\|None`	`None`	URL to receive async callback

All three audio methods return an Audio object with .id, .name, .length, and .collection_id.

Text Generation (LLM Integration)

Use coll.generate_text() to run LLM analysis. This is a Collection-level method -- pass any context (transcripts, descriptions) directly in the prompt string.

python

# Get transcript from a video first
transcript_text = video.get_transcript_text()

# Generate analysis using collection LLM
result = coll.generate_text(
    prompt=f"Summarize the key points discussed in this video:\n{transcript_text}",
    model_name="pro",
)

print(result["output"])

generate_text Parameters

Parameter	Type	Default	Description
`prompt`	`str`	required	Prompt with context for the LLM
`model_name`	`str`	`"basic"`	Model tier: `"basic"`, `"pro"`, or `"ultra"`
`response_type`	`str`	`"text"`	Response format: `"text"` or `"json"`

Returns a dict with an output key. When response_type="text", output is a str. When response_type="json", output is a dict.

python

result = coll.generate_text(prompt="Summarize this", model_name="pro")
print(result["output"])  # access the actual text/dict

Analyze Scenes with LLM

Combine scene extraction with text generation:

python

from videodb import SceneExtractionType

# First index scenes
scenes = video.index_scenes(
    extraction_type=SceneExtractionType.time_based,
    extraction_config={"time": 10},
    prompt="Describe the visual content in this scene.",
)

# Get transcript for spoken context
transcript_text = video.get_transcript_text()
scene_descriptions = []
for scene in scenes:
    if isinstance(scene, dict):
        description = scene.get("description") or scene.get("summary")
    else:
        description = getattr(scene, "description", None) or getattr(scene, "summary", None)
    scene_descriptions.append(description or str(scene))

scenes_text = "\n".join(scene_descriptions)

# Analyze with collection LLM
result = coll.generate_text(
    prompt=(
        f"Given this video transcript:\n{transcript_text}\n\n"
        f"And these visual scene descriptions:\n{scenes_text}\n\n"
        "Based on the spoken and visual content, describe the main topics covered."
    ),
    model_name="pro",
)
print(result["output"])

Dubbing and Translation

Dub a Video

Dub a video into another language using the collection method:

python

dubbed_video = coll.dub_video(
    video_id=video.id,
    language_code="es",  # Spanish
)

dubbed_video.play()

dub_video Parameters

Parameter	Type	Default	Description
`video_id`	`str`	required	ID of the video to dub
`language_code`	`str`	required	Target language code (e.g., `"es"`, `"fr"`, `"de"`)
`callback_url`	`str\|None`	`None`	URL to receive async callback

Returns a Video object with the dubbed content.

Translate Transcript

Translate a video's transcript without dubbing:

python

translated = video.translate_transcript(
    language="Spanish",
    additional_notes="Use formal tone",
)

for entry in translated:
    print(entry)

Supported languages include: en, es, fr, de, it, pt, ja, ko, zh, hi, ar, and more.

Complete Workflow Examples

Generate Narration for a Video

python

import videodb

conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")

# Get transcript
transcript_text = video.get_transcript_text()

# Generate narration script using collection LLM
result = coll.generate_text(
    prompt=(
        f"Write a professional narration script for this video content:\n"
        f"{transcript_text[:2000]}"
    ),
    model_name="pro",
)
script = result["output"]

# Convert script to speech
narration = coll.generate_voice(text=script)
print(f"Narration audio: {narration.id}")

Generate Thumbnail from Prompt

python

thumbnail = coll.generate_image(
    prompt="professional video thumbnail showing data analytics dashboard, modern design",
    aspect_ratio="16:9",
)
print(f"Thumbnail URL: {thumbnail.generate_url()}")

Add Generated Music to Video

python

import videodb
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, AudioAsset

conn = videodb.connect()
coll = conn.get_collection()
video = coll.get_video("your-video-id")

# Generate background music
music = coll.generate_music(
    prompt="calm ambient background music for a tutorial video",
    duration=60,
)

# Build timeline with video + music overlay
timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id))
timeline.add_overlay(0, AudioAsset(asset_id=music.id, disable_other_tracks=False))

stream_url = timeline.generate_stream()
print(f"Video with music: {stream_url}")

Structured JSON Output

python

transcript_text = video.get_transcript_text()

result = coll.generate_text(
    prompt=(
        f"Given this transcript:\n{transcript_text}\n\n"
        "Return a JSON object with keys: summary, topics (array), action_items (array)."
    ),
    model_name="pro",
    response_type="json",
)

# result["output"] is a dict when response_type="json"
print(result["output"]["summary"])
print(result["output"]["topics"])

Tips

Generated media is persistent: All generated content is stored in your collection and can be reused.
Three audio methods: Use generate_music() for background music, generate_sound_effect() for SFX, and generate_voice() for text-to-speech. There is no unified generate_audio() method.
Text generation is collection-level: coll.generate_text() does not have access to video content automatically. Fetch the transcript with video.get_transcript_text() and pass it in the prompt.
Model tiers: "basic" is fastest, "pro" is balanced, "ultra" is highest quality. Use "pro" for most analysis tasks.
Combine generation types: Generate images for overlays, music for backgrounds, and voice for narration, then compose using timelines (see editor.md).
Prompt quality matters: Descriptive, specific prompts produce better results across all generation types.
Aspect ratios for images: Choose from "1:1", "9:16", "16:9", "4:3", or "3:4".