.agents/references/voice-pipeline-lifecycle.md
Use this reference for changes to VoicePipeline, AudioInput, StreamedAudioInput, STT sessions, TTS task ordering, voice lifecycle events, PCM framing, result streaming, or voice tracing. Realtime agents use a different live-session architecture; read Realtime session lifecycle for that path.
VoicePipeline owns an STT-to-workflow-to-TTS producer task and returns a StreamedAudioResult that drives its observable completion.
AudioInput produces one transcription and one workflow turn. StreamedAudioInput creates a long-lived transcription session and runs one workflow turn for each emitted transcript until the input or session ends.finally before marking output complete. Partial setup and workflow failure must not strand the STT connection or producer task.workflow.on_start() applies only to the streamed multi-turn path. Its failure is logged and skipped so the transcription session can still start; normal per-turn workflow failures are terminal and surface through the result stream.StreamedAudioInput. Lifecycle events expose turn boundaries, but microphone muting, playback interruption, and barge-in policy remain application-owned._ordered_tasks and the dispatcher must emit their audio and lifecycle events in workflow text order rather than completion order.turn_started precedes audio for that turn. turn_ended is emitted only after the turn's final text remainder has been synthesized and its audio dispatched. session_ended follows all ordered segment queues and all turns.VoiceStreamEventError terminates result streaming and the stored exception is raised after task cleanup. session_ended is a lifecycle marker, not proof of success; consumers must still observe the terminal exception from stream().StreamedAudioResult.stream() is the public completion and error boundary. On normal session_ended, let the producer finish before cleanup so session close and trace end are not cancelled by result teardown.buffer_size to TTS source chunks without changing sample order. Convert to float32 only after PCM16 framing is complete, then apply caller-provided transform_data to each emitted array.AudioInput.to_base64() and audio-file conversion must not mutate the caller's NumPy buffer when converting float input to PCM16.VoicePipeline.run() returns its result object.trace_include_sensitive_data governs transcript and TTS text, while trace_include_sensitive_audio_data governs encoded audio payloads.docs/voice/pipeline.mddocs/voice/tracing.mdsrc/agents/voice/pipeline.pysrc/agents/voice/result.pysrc/agents/voice/input.pysrc/agents/voice/model.pysrc/agents/voice/models/openai_stt.pytests/voice/test_pipeline.pytests/voice/test_input.pytests/voice/test_openai_stt.pytests/voice/test_openai_tts.py