skills/videodb/reference/capture.md
VideoDB Capture enables real-time screen and audio recording with AI processing. Desktop capture currently supports macOS only.
For code-level details (SDK methods, event structures, AI pipelines), see capture-reference.md.
python scripts/ws_listener.py --clear &/tmp/videodb_events.jsonlNo webhooks or polling required. WebSocket delivers all events including session lifecycle.
CRITICAL: The
CaptureClientmust remain running for the entire duration of the capture. It runs the local recorder binary that streams screen/audio data to VideoDB. If the Python process that created theCaptureClientexits, the recorder binary is killed and capture stops silently. Always run the capture code as a long-lived background process (e.g.nohup python capture_script.py &) and use signal handling (asyncio.Event+SIGINT/SIGTERM) to keep it alive until you explicitly stop it.
Start WebSocket listener in background with --clear flag to clear old events. Wait for it to create the WebSocket ID file.
Read the WebSocket ID. This ID is required for capture session and AI pipelines.
Create a capture session and generate a client token for the desktop client.
Initialize CaptureClient with the token. Request permissions for microphone and screen capture.
List and select channels (mic, display, system_audio). Set store = True on channels you want to persist as a video.
Start the session with selected channels.
Wait for session active by reading events until you see capture_session.active. This event contains the rtstreams array. Save session info (session ID, RTStream IDs) to a file (e.g. /tmp/videodb_capture_info.json) so other scripts can read it.
Keep the process alive. Use asyncio.Event with signal handlers for SIGINT/SIGTERM to block until explicitly stopped. Write a PID file (e.g. /tmp/videodb_capture_pid) so the process can be stopped later with kill $(cat /tmp/videodb_capture_pid). The PID file should be overwritten on every run so reruns always have the correct PID.
Start AI pipelines (in a separate command/script) on each RTStream for audio indexing and visual indexing. Read the RTStream IDs from the saved session info file.
Write custom event processing logic (in a separate command/script) to read real-time events based on your use case. Examples:
visual_index mentions "Slack"audio_index events arrivetranscriptStop capture when done — send SIGTERM to the capture process. It should call client.stop_capture() and client.shutdown() in its signal handler.
Wait for export by reading events until you see capture_session.exported. This event contains exported_video_id, stream_url, and player_url. This may take several seconds after stopping capture.
Stop WebSocket listener after receiving the export event. Use kill $(cat /tmp/videodb_ws_pid) to cleanly terminate it.
Proper shutdown order is important to ensure all events are captured:
client.stop_capture() then client.shutdown()/tmp/videodb_events.jsonl for capture_session.exportedkill $(cat /tmp/videodb_ws_pid)Do NOT kill the WebSocket listener before receiving the export event, or you will miss the final video URLs.
| Script | Description |
|---|---|
scripts/ws_listener.py | WebSocket event listener (dumps to JSONL) |
# Start listener in background (append to existing events)
python scripts/ws_listener.py &
# Start listener with clear (new session, clears old events)
python scripts/ws_listener.py --clear &
# Custom output directory
python scripts/ws_listener.py --clear /path/to/events &
# Stop the listener
kill $(cat /tmp/videodb_ws_pid)
Options:
--clear: Clear the events file before starting. Use when starting a new capture session.Output files:
videodb_events.jsonl - All WebSocket eventsvideodb_ws_id - WebSocket connection ID (for ws_connection_id parameter)videodb_ws_pid - Process ID (for stopping the listener)Features: