multimodal/agent-tars/core/README.md
<b>Agent TARS</b> is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a <a href="https://agent-tars.com/guide/basic/cli.html" target="_blank">CLI</a> and <a href="https://agent-tars.com/guide/basic/web-ui.html" target="_blank">Web UI</a> for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world <a href="https://agent-tars.com/guide/basic/mcp.html" target="_blank">MCP</a> tools.
π£ Just released: Agent TARS Beta - check out our announcement blog post!
https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8
<table> <thead> <tr> <th width="50%" align="center">Booking Hotel</th> <th width="50%" align="center">Generate Chart with extra MCP Servers</th> </tr> </thead> <tbody> <tr> <td align="center"> <video src="https://github.com/user-attachments/assets/c9489936-afdc-4d12-adda-d4b90d2a869d" width="50%"></video> </td> <td align="center"> <video src="https://github.com/user-attachments/assets/a9fd72d0-01bb-4233-aa27-ca95194bbce9" width="50%"></video> </td> </tr> <tr> <td align="left"> <b>Instruction:</b> <i>I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me</i> </td> <td align="left"> <b>Instruction:</b> <i>Draw me a chart of Hangzhou's weather for one month</i> </td> </tr> </tbody> </table>For more use cases, please check out #842.
@agent-tars/core is the core implementation of Agent TARS, built on top of the Tarko Agent framework. It provides a comprehensive multimodal AI agent with advanced browser automation, filesystem operations, and intelligent search capabilities.
# Luanch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
Visit the comprehensive Quick Start guide for detailed setup instructions.
npm install @agent-tars/core
Agent TARS can be started in multiple ways:
# Install globally
npm install -g @agent-tars/cli
# Run Agent TARS
agent-tars
# Or use directly via npx
npx @agent-tars/cli
# Install globally
npm install -g @tarko/agent-cli
# Run Agent TARS through tarko CLI
tarko run agent-tars
# Or use directly via npx
npx @tarko/agent-cli run agent-tars
import { AgentTARS } from '@agent-tars/core';
// Create an agent instance
const agent = new AgentTARS({
model: {
provider: 'openai',
model: 'gpt-4',
apiKey: process.env.OPENAI_API_KEY,
},
workspace: './workspace',
browser: {
headless: false,
control: 'hybrid',
},
});
// Initialize and run
await agent.initialize();
const result = await agent.run('Search for the latest AI research papers');
console.log(result);
interface AgentTARSOptions {
// Model configuration
model?: {
provider: 'openai' | 'anthropic' | 'doubao';
model: string;
apiKey: string;
};
// Browser settings
browser?: {
headless?: boolean;
control?: 'dom' | 'visual-grounding' | 'hybrid';
cdpEndpoint?: string;
};
// Search configuration
search?: {
provider: 'browser_search' | 'tavily';
count?: number;
apiKey?: string;
};
// Workspace settings
workspace?: string;
// MCP implementation
mcpImpl?: 'in-memory' | 'stdio';
}
dom: Direct DOM manipulation (fastest, most reliable)visual-grounding: Vision-based interaction (most flexible)hybrid: Combines both approaches (recommended)const agent = new AgentTARS({
instructions: `
You are a specialized research assistant.
Focus on academic papers and technical documentation.
Always provide citations and sources.
`,
// ... other options
});
// Get browser control information
const browserInfo = agent.getBrowserControlInfo();
console.log(`Mode: ${browserInfo.mode}`);
console.log(`Tools: ${browserInfo.tools.join(', ')}`);
// Access browser manager
const browserManager = agent.getBrowserManager();
if (browserManager) {
const isAlive = await browserManager.isBrowserAlive();
console.log(`Browser status: ${isAlive ? 'alive' : 'dead'}`);
}
// Get current workspace
const workspace = agent.getWorkingDirectory();
console.log(`Working in: ${workspace}`);
// All file operations are automatically scoped to workspace
const result = await agent.run('Create a summary.md file with today\'s findings');
try {
await agent.initialize();
const result = await agent.run('Your task here');
} catch (error) {
console.error('Agent error:', error);
} finally {
// Always cleanup
await agent.cleanup();
}
initialize(): Initialize the agent and all componentsrun(message): Execute a task with the given messagecleanup(): Clean up all resourcesgetWorkingDirectory(): Get current workspace pathgetBrowserControlInfo(): Get browser control statusgetBrowserManager(): Access browser manager instanceThe agent emits events through the event stream:
agent.eventStream.subscribe((event) => {
if (event.type === 'tool_result') {
console.log(`Tool ${event.name} completed`);
}
});
See Full CHANGELOG
See CONTRIBUTING.md for development guidelines.
Apache-2.0 - See LICENSE for details.