skills/videodb/reference/search.md
Search allows you to find specific moments inside videos using natural language queries, exact keywords, or visual scene descriptions.
Videos must be indexed before they can be searched. Indexing is a one-time operation per video per index type.
Index the transcribed speech content of a video for semantic and keyword search:
video = coll.get_video(video_id)
# force=True makes indexing idempotent — skips if already indexed
video.index_spoken_words(force=True)
This transcribes the audio track and builds a searchable index over the spoken content. Required for semantic search and keyword search.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
language_code | str|None | None | Language code of the video |
segmentation_type | SegmentationType | SegmentationType.sentence | Segmentation type (sentence or llm) |
force | bool | False | Set to True to skip if already indexed (avoids "already exists" error) |
callback_url | str|None | None | Webhook URL for async notification |
Index visual content by generating AI descriptions of scenes. Like spoken word indexing, this raises an error if a scene index already exists. Extract the existing scene_index_id from the error message.
import re
from videodb import SceneExtractionType
try:
scene_index_id = video.index_scenes(
extraction_type=SceneExtractionType.shot_based,
prompt="Describe the visual content, objects, actions, and setting in this scene.",
)
except Exception as e:
match = re.search(r"id\s+([a-f0-9]+)", str(e))
if match:
scene_index_id = match.group(1)
else:
raise
Extraction types:
| Type | Description | Best For |
|---|---|---|
SceneExtractionType.shot_based | Splits on visual shot boundaries | General purpose, action content |
SceneExtractionType.time_based | Splits at fixed intervals | Uniform sampling, long static content |
SceneExtractionType.transcript | Splits based on transcript segments | Speech-driven scene boundaries |
Parameters for time_based:
video.index_scenes(
extraction_type=SceneExtractionType.time_based,
extraction_config={"time": 5, "select_frames": ["first", "last"]},
prompt="Describe what is happening in this scene.",
)
Natural language queries matched against spoken content:
from videodb import SearchType
results = video.search(
query="explaining the benefits of machine learning",
search_type=SearchType.semantic,
)
Returns ranked segments where the spoken content semantically matches the query.
Exact term matching in transcribed speech:
results = video.search(
query="artificial intelligence",
search_type=SearchType.keyword,
)
Returns segments containing the exact keyword or phrase.
Visual content queries matched against indexed scene descriptions. Requires a prior index_scenes() call.
index_scenes() returns a scene_index_id. Pass it to video.search() to target a specific scene index (especially important when a video has multiple scene indexes):
from videodb import SearchType, IndexType
from videodb.exceptions import InvalidRequestError
# Search using semantic search against the scene index.
# Use score_threshold to filter low-relevance noise (recommended: 0.3+).
try:
results = video.search(
query="person writing on a whiteboard",
search_type=SearchType.semantic,
index_type=IndexType.scene,
scene_index_id=scene_index_id,
score_threshold=0.3,
)
shots = results.get_shots()
except InvalidRequestError as e:
if "No results found" in str(e):
shots = []
else:
raise
Important notes:
SearchType.semantic with index_type=IndexType.scene — this is the most reliable combination and works on all plans.SearchType.scene exists but may not be available on all plans (e.g. Free tier). Prefer SearchType.semantic with IndexType.scene.scene_index_id parameter is optional. If omitted, the search runs against all scene indexes on the video. Pass it to target a specific index.scene_index_id.When indexing scenes with custom metadata, you can combine semantic search with metadata filters:
from videodb import SearchType, IndexType
results = video.search(
query="a skillful chasing scene",
search_type=SearchType.semantic,
index_type=IndexType.scene,
scene_index_id=scene_index_id,
filter=[{"camera_view": "road_ahead"}, {"action_type": "chasing"}],
)
See the scene_level_metadata_indexing cookbook for a full example of custom metadata indexing and filtered search.
Access individual result segments:
results = video.search("your query")
for shot in results.get_shots():
print(f"Video: {shot.video_id}")
print(f"Start: {shot.start:.2f}s")
print(f"End: {shot.end:.2f}s")
print(f"Text: {shot.text}")
print("---")
Stream all matching segments as a single compiled video:
results = video.search("your query")
stream_url = results.compile()
results.play() # opens compiled stream in browser
Download or stream specific result segments:
for shot in results.get_shots():
stream_url = shot.generate_stream()
print(f"Clip: {stream_url}")
Search across all videos in a collection:
coll = conn.get_collection()
# Search across all videos in the collection
results = coll.search(
query="product demo",
search_type=SearchType.semantic,
)
for shot in results.get_shots():
print(f"Video: {shot.video_id} [{shot.start:.1f}s - {shot.end:.1f}s]")
Note: Collection-level search only supports
SearchType.semantic. UsingSearchType.keywordorSearchType.scenewithcoll.search()will raiseNotImplementedError. For keyword or scene search, usevideo.search()on individual videos instead.
Index, search, and compile matching segments into a single playable stream:
video.index_spoken_words(force=True)
results = video.search(query="your query", search_type=SearchType.semantic)
stream_url = results.compile()
print(stream_url)
video.search() raises InvalidRequestError when no results match. Always wrap search calls in try/except and treat "No results found" as an empty result set.score_threshold=0.3 (or higher) to filter noise.index_spoken_words(force=True) to safely re-index. index_scenes() has no force parameter — wrap it in try/except and extract the existing scene_index_id from the error message with re.search(r"id\s+([a-f0-9]+)", str(e)).