multimodal/websites/tarko/docs/en/guide/advanced/context-engineering.mdx
Context Engineering is Tarko's core capability for building agents capable of long-running operations. It manages context windows and optimizes memory usage through intelligent message history management.
Traditional agents struggle with long-running tasks due to context window limitations. Tarko's Context Engineering solves this through:
Configure how the agent manages context and multimodal content:
import { Agent } from '@tarko/agent';
const agent = new Agent({
context: {
maxImagesCount: 5, // Limit images in context (default: 5)
}
});
The MessageHistory class automatically converts event streams to message history:
// From multimodal/tarko/agent/src/agent/message-history.ts
const messageHistory = new MessageHistory(
eventStream,
5 // maxImagesCount - limits images to prevent context overflow
);
const messages = messageHistory.toMessageHistory(
toolCallEngine,
systemPrompt,
tools
);
Control how images are handled in long conversations:
const agent = new Agent({
context: {
maxImagesCount: 10, // Allow up to 10 images in context
}
});
How it works:
Based on the actual AgentContextAwarenessOptions interface:
interface AgentContextAwarenessOptions {
/**
* Maximum number of images to include in context
* When exceeded, oldest images are replaced with text placeholders
* @default 5
*/
maxImagesCount?: number;
}
const agent = new Agent({
context: {
maxImagesCount: 10, // Limit images in context
},
// Other agent options...
});
For text-heavy conversations:
const agent = new Agent({
context: {
maxImagesCount: 3, // Keep fewer images for text focus
},
});
For visual analysis tasks:
const agent = new Agent({
context: {
maxImagesCount: 15, // Allow more images for visual context
},
});
Use event stream to track context changes:
const response = await agent.run({
input: "Analyze these images",
stream: true,
});
for await (const event of response) {
if (event.type === 'user_message' || event.type === 'environment_input') {
console.log('Context updated with:', event.content);
}
}
// Environment input with images
const response = await agent.run({
input: "What do you see?",
environmentInput: {
content: [
{ type: 'text', text: 'Current screen:' },
{ type: 'image_url', image_url: { url: 'data:image/png;base64,...' } }
],
description: 'Screen capture'
}
});
Extend the MessageHistory class for custom context management:
import { MessageHistory } from '@tarko/agent';
class CustomMessageHistory extends MessageHistory {
constructor(eventStream, maxImagesCount = 5) {
super(eventStream, maxImagesCount);
}
// Override to add custom system prompt with time
getSystemPromptWithTime(instructions: string): string {
const customTime = new Date().toLocaleString('en-US', {
timeZone: 'America/New_York'
});
return `${instructions}\n\nCurrent time (EST): ${customTime}`;
}
}
Access and manipulate the event stream for custom context logic:
const agent = new Agent({ /* options */ });
// Get the event stream
const eventStream = agent.getEventStream();
// Access events
const events = eventStream.getEvents();
console.log(`Total events: ${events.length}`);
// Filter specific event types
const userMessages = events.filter(e => e.type === 'user_message');
const toolCalls = events.filter(e => e.type === 'tool_call');
Use Agent Hooks to customize context behavior:
const agent = new Agent({
hooks: {
onBeforeToolCall: async (context) => {
// Log context before tool execution
console.log('Context before tool call:', context.messages.length);
},
onAfterToolCall: async (context) => {
// Monitor context growth after tool execution
console.log('Context after tool call:', context.messages.length);
},
onRetrieveTools: async (tools) => {
// Filter tools based on context size
const eventStream = agent.getEventStream();
const events = eventStream.getEvents();
if (events.length > 50) {
// Reduce tools for large contexts
return tools.slice(0, 3);
}
return tools;
}
}
});
maxImagesCount based on available memoryimport { LogLevel } from '@tarko/agent';
const agent = new Agent({
logLevel: LogLevel.DEBUG, // Enable detailed logging
});
// Get event stream for analysis
const eventStream = agent.getEventStream();
const events = eventStream.getEvents();
console.log('Total events:', events.length);
console.log('Event types:', [...new Set(events.map(e => e.type))]);
// Count images in context
const imageCount = events.reduce((count, event) => {
if (event.type === 'user_message' && Array.isArray(event.content)) {
return count + event.content.filter(part =>
typeof part === 'object' && part.type === 'image_url'
).length;
}
return count;
}, 0);
console.log('Images in context:', imageCount);
// Export events for analysis
const fs = require('fs');
fs.writeFileSync('events-dump.json', JSON.stringify(events, null, 2));
const visualAgent = new Agent({
context: {
maxImagesCount: 20, // Allow many images for visual tasks
},
instructions: 'You are a visual analysis expert. Analyze images and provide detailed insights.',
});
const textAssistant = new Agent({
context: {
maxImagesCount: 2, // Minimal images for text focus
},
instructions: 'You are a writing assistant focused on text analysis and generation.',
});
const conversationAgent = new Agent({
context: {
maxImagesCount: 8, // Balanced for mixed content
},
instructions: 'You are a helpful assistant for extended conversations.',
});
// Monitor context growth
setInterval(() => {
const events = conversationAgent.getEventStream().getEvents();
console.log(`Context events: ${events.length}`);
}, 60000); // Check every minute