packages/docs/guides/streaming-responses.mdx
Users hate waiting. A 3-second response feels like an eternity when you're staring at a blank screen.
Traditional request/response patterns make your agent feel sluggish, even when the LLM is fast. The UI waits for the entire response before showing anything.
<Tip> **Streaming changes everything.** Users see tokens appear in real-time, making responses feel instant even when they take seconds to complete. </Tip>ElizaOS supports three response modes out of the box:
| Mode | Latency | Use Case |
|---|---|---|
| Sync | Wait for complete response | Simple integrations, batch processing |
| Stream | Tokens appear in real-time | Chat UIs, interactive experiences |
| WebSocket | Bidirectional, persistent | Voice conversations, multi-turn |
Send a message with stream: true to get Server-Sent Events:
const response = await fetch(`/api/agents/${agentId}/message`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
entityId: "user-123",
roomId: "room-456",
content: { text: "Hello!", source: "api" },
stream: true, // Enable streaming
}),
});
// Process SSE stream
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n").filter((line) => line.startsWith("data: "));
for (const line of lines) {
const data = JSON.parse(line.slice(6));
if (data.type === "chunk") {
process.stdout.write(data.text); // Display token immediately
}
}
}
For bidirectional communication and voice conversations:
const socket = new WebSocket(`ws://localhost:3000/api/agents/${agentId}/ws`);
socket.onopen = () => {
socket.send(
JSON.stringify({
type: "message",
entityId: "user-123",
roomId: "room-456",
content: { text: "Hello!", source: "websocket" },
}),
);
};
socket.onmessage = (event) => {
const data = JSON.parse(event.data);
switch (data.type) {
case "chunk":
process.stdout.write(data.text);
break;
case "complete":
console.log("\n--- Response complete ---");
break;
case "error":
console.error("Error:", data.message);
break;
}
};
The streaming API emits these event types:
| Event | Description |
|---|---|
chunk | A token or text fragment to display |
complete | Response finished, includes full text and actions |
error | Something went wrong |
control | Backend control messages (typing indicators, etc.) |
{
type: 'chunk',
text: 'Hello', // Text fragment to append
timestamp: 1703001234567
}
{
type: 'complete',
text: 'Hello! How can I help you today?', // Full response
actions: ['REPLY'], // Executed actions
messageId: 'msg-uuid',
timestamp: 1703001234890
}
ElizaOS uses stream extractors to filter LLM output for streaming. The framework provides several built-in extractors:
Streams everything as-is. Use for plain text responses.
import { PassthroughExtractor } from "@elizaos/core";
const extractor = new PassthroughExtractor();
extractor.push("Hello "); // Returns: 'Hello '
extractor.push("world!"); // Returns: 'world!'
Extracts content from a specific XML tag. Use when LLM outputs structured XML.
import { XmlTagExtractor } from "@elizaos/core";
const extractor = new XmlTagExtractor("text");
// LLM output: <response><text>Hello world!</text></response>
extractor.push("<response><text>Hello "); // Returns: 'Hel' (keeps margin)
extractor.push("world!</text></response>"); // Returns: 'lo world!'
Action-aware extraction used by DefaultMessageService. Understands <actions> to decide what to stream.
import { ResponseStreamExtractor } from "@elizaos/core";
const extractor = new ResponseStreamExtractor();
// Only streams <text> when action is REPLY
extractor.push("<actions>REPLY</actions><text>Hello!"); // Returns: 'Hel'
extractor.push("</text>"); // Returns: 'lo!'
// Skips <text> when action is something else (action handler will respond)
extractor.push("<actions>SEARCH</actions><text>Ignored</text>"); // Returns: ''
Implement IStreamExtractor for custom filtering logic:
import type { IStreamExtractor } from "@elizaos/core";
class JsonValueExtractor implements IStreamExtractor {
private buffer = "";
private _done = false;
get done() {
return this._done;
}
push(chunk: string): string {
this.buffer += chunk;
// Try to parse and extract "response" field
try {
const json = JSON.parse(this.buffer);
this._done = true;
return json.response || "";
} catch {
return ""; // Wait for complete JSON
}
}
}
The streaming system provides typed errors for robust handling:
import { StreamError } from "@elizaos/core";
try {
const result = extractor.push(hugeChunk);
} catch (error) {
if (StreamError.isStreamError(error)) {
switch (error.code) {
case "CHUNK_TOO_LARGE":
console.error("Chunk exceeded 1MB limit");
break;
case "BUFFER_OVERFLOW":
console.error("Buffer exceeded 100KB");
break;
case "PARSE_ERROR":
console.error("Malformed content");
break;
case "TIMEOUT":
console.error("Stream timed out");
break;
case "ABORTED":
console.error("Stream was cancelled");
break;
}
}
}
flowchart TB
subgraph Client["Client"]
SSE["SSE/WebSocket Handler"]
UI["UI Update"]
end
subgraph Server["Server"]
Extractor["Stream Extractor"]
LLM["LLM Provider"]
end
LLM -->|"Generates tokens"| Extractor
Extractor -->|"Filters & buffers"| SSE
SSE -->|"Receives chunks"| UI
style Client fill:#e3f2fd
style Server fill:#e8f5e9
Data Flow: