docs/src/app/streaming/page.mdx
Stream the browser viewport via WebSocket for live preview or "pair browsing" where a human can watch and interact alongside an AI agent.
Every session automatically starts a WebSocket stream server on an OS-assigned port. The server streams viewport frames and accepts input events (mouse, keyboard, touch).
To bind to a specific port, set AGENT_BROWSER_STREAM_PORT:
AGENT_BROWSER_STREAM_PORT=9223 agent-browser open example.com
You can also manage streaming at runtime:
agent-browser stream status # Show streaming state and bound port
agent-browser stream enable --port 9223 # Re-enable on a specific port
agent-browser stream disable # Stop streaming for the session
stream status returns the enabled state, active port, browser connection state, and whether screencasting is active. stream disable tears the server down and removes the session's .stream metadata file.
agent-browser stream status --json returns data like:
{
"enabled": true,
"port": 9223,
"connected": true,
"screencasting": true
}
connected reports whether the daemon currently has a browser attached. screencasting reports whether frames are actively being produced for the stream server.
stream enable creates the WebSocket server and keeps it available for the session. WebSocket clients then trigger live frame delivery automatically.
The lower-level screencast_start and screencast_stop commands still control explicit CDP screencasts directly. Use them when you want a screencast without the WebSocket runtime server.
Connect to ws://localhost:9223 to receive frames and send input.
The server sends frame messages with base64-encoded images:
{
"type": "frame",
"data": "<base64-encoded-jpeg>",
"metadata": {
"deviceWidth": 1280,
"deviceHeight": 720,
"pageScaleFactor": 1,
"offsetTop": 0,
"scrollOffsetX": 0,
"scrollOffsetY": 0
}
}
Connection and screencast status:
{
"type": "status",
"connected": true,
"screencasting": true,
"viewportWidth": 1280,
"viewportHeight": 720
}
Send input events to control the browser remotely.
// Click
{
"type": "input_mouse",
"eventType": "mousePressed",
"x": 100,
"y": 200,
"button": "left",
"clickCount": 1
}
// Release
{
"type": "input_mouse",
"eventType": "mouseReleased",
"x": 100,
"y": 200,
"button": "left"
}
// Move
{
"type": "input_mouse",
"eventType": "mouseMoved",
"x": 150,
"y": 250
}
// Scroll
{
"type": "input_mouse",
"eventType": "mouseWheel",
"x": 100,
"y": 200,
"deltaX": 0,
"deltaY": 100
}
// Key down
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "Enter",
"code": "Enter"
}
// Key up
{
"type": "input_keyboard",
"eventType": "keyUp",
"key": "Enter",
"code": "Enter"
}
// Type character
{
"type": "input_keyboard",
"eventType": "char",
"text": "a"
}
// With modifiers (1=Alt, 2=Ctrl, 4=Meta, 8=Shift)
{
"type": "input_keyboard",
"eventType": "keyDown",
"key": "c",
"code": "KeyC",
"modifiers": 2
}
// Touch start
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [{ "x": 100, "y": 200 }]
}
// Touch move
{
"type": "input_touch",
"eventType": "touchMove",
"touchPoints": [{ "x": 150, "y": 250 }]
}
// Touch end
{
"type": "input_touch",
"eventType": "touchEnd",
"touchPoints": []
}
// Multi-touch (pinch zoom)
{
"type": "input_touch",
"eventType": "touchStart",
"touchPoints": [
{ "x": 100, "y": 200, "id": 0 },
{ "x": 200, "y": 200, "id": 1 }
]
}
For advanced use, control streaming directly via the TypeScript API:
import { BrowserManager } from 'agent-browser';
const browser = new BrowserManager();
await browser.launch({ headless: true });
await browser.navigate('https://example.com');
// Start screencast with callback
await browser.startScreencast((frame) => {
console.log('Frame:', frame.metadata.deviceWidth, 'x', frame.metadata.deviceHeight);
// frame.data is base64-encoded image
}, {
format: 'jpeg', // or 'png'
quality: 80, // 0-100, jpeg only
maxWidth: 1280,
maxHeight: 720,
everyNthFrame: 1
});
// Inject mouse event
await browser.injectMouseEvent({
type: 'mousePressed',
x: 100,
y: 200,
button: 'left',
clickCount: 1
});
// Inject keyboard event
await browser.injectKeyboardEvent({
type: 'keyDown',
key: 'Enter',
code: 'Enter'
});
// Inject touch event
await browser.injectTouchEvent({
type: 'touchStart',
touchPoints: [{ x: 100, y: 200 }]
});
// Check if screencasting
console.log('Active:', browser.isScreencasting());
// Stop screencast
await browser.stopScreencast();