Back to UI-TARS-desktop

Agent API

multimodal/websites/tarko/docs/en/api/agent.mdx

0.3.012.7 KB
Original Source

@tarko/agent

Introduction

@tarko/agent is an event-stream driven meta agent framework designed for building efficient multimodal AI Agents. It provides complete Agent lifecycle management, tool integration, and multi-model support.

When to use?

This Agent SDK provides a low-level programmatic API, suitable for building AI agents from scratch:

  • MCP Agent: Connect to MCP clients and implement standardized AI tool protocols
  • GUI Agent: Build graphical interface agents that handle user interactions
  • Custom Agents: Build specialized agents for specific domains like code generation, data analysis, etc.

Unlike high-level frameworks, @tarko/agent gives you complete control to customize Agent behavior.

Architecture Overview

mermaid
flowchart TD
    A[User Input] --> B[Agent.run]
    B --> C[Message History]
    C --> D[LLM Request]
    D --> E{Tool Calls?}
    E -->|Yes| F[Tool Execution]
    E -->|No| G[Generate Response]
    F --> H[Tool Results]
    H --> I{Max Iterations?}
    I -->|No| D
    I -->|Yes| G
    G --> J[Event Stream]
    
    J --> K[UI Updates]
    J --> L[Logging]
    J --> M[Monitoring]
    
    style A fill:#e1f5fe
    style G fill:#c8e6c9
    style F fill:#fff3e0
    style J fill:#f3e5f5

Install

bash
npm install @tarko/agent

Core Features

  1. Tool Integration - Effortlessly create and call tools within agent responses, supporting complex multi-step workflows.
  2. Event-Stream Driven - Based on standard event stream protocols, real-time tracking of Agent state for efficient context and UI building.
  3. Native Streaming - Native streaming transmission lets you understand the Agent's thinking process and output results in real time.
  4. Multimodal Analysis - Automatically analyze multimodal tool results (images, text, files, etc.), letting you focus on business logic.
  5. Strong Extension Capabilities - Rich lifecycle hook design allows you to implement more advanced Agent behaviors.
  6. Multiple Model Providers - Supports OpenAI, Claude, Doubao and other models, with advanced configuration and runtime switching.
  7. Multiple Tool Call Engines - native: OpenAI-compatible native function calling; prompt_engineering: Prompt-based tool calling; structured_outputs: JSON Schema structured outputs.

Quick Start

Create a index.ts:

ts
import { Agent, Tool, z, LogLevel } from '@tarko/agent';

const locationTool = new Tool({
  id: 'getCurrentLocation',
  description: "Get user's current location",
  parameters: z.object({}),
  function: async () => {
    return { location: 'Boston' };
  },
});

const weatherTool = new Tool({
  id: 'getWeather',
  description: 'Get weather information for a specified location',
  parameters: z.object({
    location: z.string().describe('Location name, such as city name'),
  }),
  function: async (input) => {
    const { location } = input;
    return {
      location,
      temperature: '70°F (21°C)',
      condition: 'Sunny',
      precipitation: '10%',
      humidity: '45%',
      wind: '5 mph',
    };
  },
});

const agent = new Agent({
  model: {
    provider: 'openai',
    id: 'gpt-4o',
    apiKey: process.env.OPENAI_API_KEY!, // From environment variable
  },
  tools: [locationTool, weatherTool],
  instructions: 'You are a professional weather assistant capable of getting accurate location and weather information.',
  temperature: 0.7,
  maxIterations: 50,
});

async function main() {
  const response = await agent.run({
    input: "How's the weather today?",
  });
  console.log(response);
}

main();

Execute it:

bash
npx tsx index.ts

Output:

json
{
  "id": "5c38c0a1-ccbe-48f0-8b97-ae78a4d9407e",
  "type": "assistant_message",
  "timestamp": 1750188571248,
  "content": "The weather in Boston today is sunny with a temperature of 70°F (21°C). There's a 10% chance of precipitation, humidity is at 45%, and the wind is blowing at 5 mph.",
  "finishReason": "stop",
  "messageId": "msg_1750188570877_ics24k3x"
}

API

Agent

Define an Agent instance:

ts
const agent = new Agent({
  /* AgentOptions */
});

Agent Options

Based on the actual AgentOptions interface from the source code, all options are optional:

Basic Configuration
  • id: Unique identifier for the agent instance (default: "@tarko/agent")
  • name: Agent name for tracking and logging (default: "Anonymous")
  • instructions: Agent system prompt, completely replaces default prompt (default: built-in intelligent assistant prompt)
Model Configuration
  • model: Model configuration object containing provider, id, apiKey, etc.
  • temperature: LLM temperature controlling output randomness (default: 0.7)
  • top_p: Nucleus sampling parameter controlling vocabulary selection diversity (default: model default)
  • maxTokens: Token limit per request (default: 1000)
  • thinking: Reasoning content control options
Tool Configuration
  • tools: Array of tools available to the agent
  • tool: Tool filtering options supporting include/exclude patterns
  • toolCallEngine: Tool call engine type (default: 'native')
    • 'native': OpenAI-compatible native function calling
    • 'prompt_engineering': Prompt-based tool calling
    • 'structured_outputs': JSON Schema structured outputs
Execution Control
  • maxIterations: Maximum number of iterations (default: 1000)
  • context: Context awareness options like maxImagesCount
Debug and Monitoring
  • logLevel: Log level (LogLevel.DEBUG, LogLevel.INFO, etc.)
  • metric: Performance metrics collection configuration
  • enableStreamingToolCallEvents: Enable streaming tool call events (default: false)
Advanced Options
  • workspace: Working directory for filesystem operations
  • sandboxUrl: Sandbox environment URL
  • eventStreamOptions: Event stream processor configuration
  • initialEvents: Array of events to restore during initialization

Tool

Define a Tool instance:

ts
import { Tool, z } from '@tarko/agent';

const locationTool = new Tool({
  id: 'getCurrentLocation',
  description: "Get user's current location",
  parameters: z.object({}),
  function: async () => {
    return { location: 'Boston' };
  },
});

Tool Options

  • id: Unique identifier for the tool
  • description: Description of what the tool does
  • parameters: Zod schema for tool parameters
  • function: Async function that implements the tool logic

Guide

Streaming Mode

Streaming mode allows you to monitor the Agent's execution process in real time, including thinking, tool calls, and response generation:

ts
async function main() {
  const stream = await agent.run({
    input: "How's the weather today?",
    stream: true,
  });

  for await (const event of stream) {
    switch (event.type) {
      case 'assistant_streaming_message':
        process.stdout.write(event.content); // Real-time output
        break;
      case 'tool_call':
        console.log(`Calling tool: ${event.name}`);
        break;
      case 'tool_result':
        console.log(`Tool result: ${event.elapsedMs}ms`);
        break;
      case 'assistant_message':
        console.log(`\nComplete response: ${event.content}`);
        break;
    }
  }
}

Main Event Types

  • user_message: User input message
  • agent_run_start: Agent starts execution
  • assistant_streaming_message: Real-time streaming message chunks
  • tool_call: Tool call starts
  • tool_result: Tool execution results
  • assistant_message: Complete assistant message
  • agent_run_end: Agent execution ends

Streaming mode is particularly suitable for building real-time UI interfaces, letting users see the Agent's "thinking process".

Event Types

AssistantMessage

ts
interface AssistantMessage {
  id: string;
  type: 'assistant_message';
  timestamp: number;
  content: string;
  toolCalls?: ChatCompletionMessageToolCall[];
  finishReason: 'stop' | 'tool_calls' | 'length';
  messageId: string;
}

ToolCall

ts
interface ToolCallEvent {
  id: string;
  type: 'tool_call';
  timestamp: number;
  toolCallId: string;
  name: string;
  arguments: Record<string, any>;
  startTime: number;
  tool: {
    name: string;
    description: string;
    schema: any;
  };
}

ToolResult

ts
interface ToolResult {
  id: string;
  type: 'tool_result';
  timestamp: number;
  toolCallId: string;
  name: string;
  content: any;
  elapsedMs: number;
}

StreamingMessage

ts
interface StreamingMessage {
  id: string;
  type: 'assistant_streaming_message';
  timestamp: number;
  content: string;
  isComplete: boolean;
  messageId: string;
}

Utility Methods

Direct LLM Calls

Besides the complete Agent workflow, you can also directly call the currently configured LLM:

ts
// Non-streaming call
const response = await agent.callLLM({
  messages: [
    { role: 'user', content: 'Hello' }
  ],
  temperature: 0.5,
});

// Streaming call
const stream = await agent.callLLM({
  messages: [
    { role: 'user', content: 'Write a poem' }
  ],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Get Available Tools

ts
// Get all registered tools
const allTools = agent.getTools();

// Get tools processed through hooks
const availableTools = await agent.getAvailableTools();

console.log(`${availableTools.length} tools available`);

Generate Conversation Summary

ts
const summary = await agent.generateSummary({
  messages: [
    { role: 'user', content: "How's the weather today?" },
    { role: 'assistant', content: 'Today is sunny with 22°C temperature.' },
  ],
});

console.log(summary.summary); // "Weather Query"

Execution Control

ts
// Check Agent status
console.log(agent.status()); // 'idle' | 'running' | 'error'

// Get current iteration count
console.log(agent.getCurrentLoopIteration());

// Abort execution
if (agent.status() === 'running') {
  agent.abort();
}

// Resource cleanup
await agent.dispose();

Lifecycle Hooks

@tarko/agent provides rich hooks for customizing Agent behavior:

ts
class CustomAgent extends Agent {
  async onBeforeToolCall(sessionId, toolCall, args) {
    console.log(`Preparing to call tool: ${toolCall.name}`);
    // Can modify arguments
    return { ...args, timestamp: Date.now() };
  }
  
  async onAfterToolCall(sessionId, toolCall, result) {
    console.log(`Tool call completed: ${toolCall.name}`);
    // Can modify result
    return result;
  }
  
  async onLLMRequest(sessionId, payload) {
    console.log(`Sending LLM request: ${payload.messages.length} messages`);
  }
}

Best Practices

Choose the Right Tool Call Engine

ts
// For models supporting function calling (recommended)
const nativeAgent = new Agent({
  toolCallEngine: 'native', // OpenAI, Claude, etc.
});

// For models not supporting function calling
const promptAgent = new Agent({
  toolCallEngine: 'prompt_engineering', // Open source models
});

// For scenarios requiring strict structured output
const structuredAgent = new Agent({
  toolCallEngine: 'structured_outputs',
});

Tool Design Principles

ts
// ✅ Good tool design
const goodTool = new Tool({
  id: 'searchWeb',
  description: 'Search for information on the web and return relevant results',
  parameters: z.object({
    query: z.string().describe('Search keywords'),
    limit: z.number().default(5).describe('Number of results to return'),
  }),
  function: async ({ query, limit }) => {
    // Implement search logic
    return { results: [], total: 0 };
  },
});

// ❌ Avoid this tool design
const badTool = new Tool({
  id: 'doEverything', // Too broad functionality
  description: 'Do anything', // Unclear description
  parameters: z.object({
    input: z.any(), // Unclear parameter type
  }),
  function: async (input) => {
    // Logic too complex
  },
});

Performance Optimization

ts
const optimizedAgent = new Agent({
  // Limit context size
  context: {
    maxImagesCount: 3, // Avoid oversized context
  },
  
  // Reasonable iteration count
  maxIterations: 20, // Avoid infinite loops
  
  // Enable performance monitoring
  metric: {
    enable: true,
  },
  
  // Appropriate temperature setting
  temperature: 0.3, // Lower temperature recommended for production
});

Security Considerations

ts
// Tool filtering
const safeAgent = new Agent({
  tools: [allTools],
  tool: {
    exclude: ['fileDelete', 'systemCommand'], // Exclude dangerous tools
  },
});

// Input validation
class SecureAgent extends Agent {
  async onBeforeToolCall(sessionId, toolCall, args) {
    // Validate tool parameters
    if (toolCall.name === 'fileRead' && args.path.includes('..')) {
      throw new Error('Path traversal attack detected');
    }
    return args;
  }
}

For more advanced usage patterns, see the Agent Hooks documentation.