examples/gui-agent-2.0/quick-start-for-agent.md
package.json does NOT contain "type": "module""module": "CommonJS" in tsconfig.json.js extensions needed in TypeScript importsModelProviderName or AgentModel typesas const Assertion: For string literals that need specific typesprovider: 'volcengine' as const (NOT provider: 'volcengine').env.local (copy from .env.local.example)dotenv with path.join(__dirname, '..', '.env.local')# Model Service Configuration
ARK_BASE_URL=https://your-model-service-url # Model Service API endpoint
ARK_API_KEY=your-actual-model-service-api-key # Your Model Service API key
# Doubao Models Configuration
DOUBAO_1_5_VP=your-model-key-abcdef # Doubao 1.5 VP model endpoint ID
DOUBAO_SEED_1_6=your-model-key-fedcba # Doubao Seed 1.6 model endpoint ID
# AIO Sandbox Configuration
SANDBOX_URL=http://your-sandbox-url:port # AIO operator sandbox URL
ep-{timestamp}-{hash}.env.local.example to .env.localyour-* placeholders with actual valuesgui-agent-standalone/
├── src/
│ ├── index.ts # Main entry point
│ └── constants.ts # System prompt definition
├── dist/ # Compiled JavaScript output
├── package.json # Dependencies and scripts
├── tsconfig.json # TypeScript configuration
├── .env.local.example # Environment template
└── .env.local # Actual environment (create manually)
dotenv: Environment variable loading@gui-agent/agent-sdk: Core GUI agent functionality@gui-agent/operator-aio: AIO hybrid operator@gui-agent/action-parser: Action parsing utilitiestypescript: TypeScript compilertsx: TypeScript execution for development@types/node: Node.js type definitionsnpm run build # Compiles TypeScript to dist/
dist/index.js and dist/constants.jsnpm start # Runs compiled JavaScript
# OR for development
npm run dev # Direct TypeScript execution
const doubao = {
id: process.env.DOUBAO_SEED_1_6!,
provider: 'volcengine' as const, // CRITICAL: as const assertion
baseURL: process.env.ARK_BASE_URL!,
apiKey: process.env.ARK_API_KEY!,
};
const operator = new AIOHybridOperator({
baseURL: process.env.SANDBOX_URL!,
timeout: 10000,
});
const guiAgent = new GUIAgent({
operator,
model: doubao, // No type assertion needed with as const
systemPrompt: SYSTEM_PROMPT,
});
async function main() {
const response = await guiAgent.run({
input: [{ type: 'text', text: 'your-task-here' }],
});
console.log(response.content);
}
npm run build first, ensure CommonJS configas const assertion on provider fieldprovider: 'volcengine'provider: 'volcengine' as const"type": "module" from package.json{
"compilerOptions": {
"target": "ES2022",
"module": "CommonJS", // CRITICAL: Must be CommonJS
"moduleResolution": "node", // CRITICAL: Must be node
"outDir": "./dist",
"rootDir": "./src",
"esModuleInterop": true,
"strict": true
}
}
{
"main": "dist/index.js",
// NO "type": "module" field
"scripts": {
"build": "tsc",
"start": "node dist/index.js",
"dev": "tsx src/index.ts",
"clean": "rm -rf dist"
}
}
src/constants.tsSYSTEM_PROMPT constantexport const SYSTEM_PROMPT = `
You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task.
## Output Format
\`\`\`
Thought: ...
Action: ...
\`\`\`
## Action Space
navigate(url='xxx') # The url to navigate to
navigate_back() # Navigate back to the previous page.
click(point='<point>x1 y1</point>')
left_double(point='<point>x1 y1</point>')
right_single(point='<point>x1 y1</point>')
drag(start_point='<point>x1 y1</point>', end_point='<point>x2 y2</point>')
hotkey(key='ctrl c') # Split keys with a space and use lowercase. Also, do not use more than 3 keys in one hotkey action.
type(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format. If you want to submit your input, use \\n at the end of content.
scroll(point='<point>x1 y1</point>', direction='down or up or right or left') # Show more information on the \`direction\` side.
wait() #Sleep for 5s and take a screenshot to check for any changes.
finished(content='xxx') # Use escape characters \\', \\", and \\n in content part to ensure we can parse the content in normal python string format.
## Note
- Use Chinese in \`Thought\` part.
- Write a small plan and finally summarize your next action (with its target element) in one sentence in \`Thought\` part.
## User Instruction
`;
\', \", \n for special characters\n for new lines\n to submit input.env.local.example to .env.localnpm installnpm run build (MANDATORY)npm startnpm run dev (skips build)npm run clean (removes dist/).env.local using dotenvas const)// 1. Environment Loading
dotenv.config({ path: path.join(__dirname, '..', '.env.local') });
// 2. Model Configuration with Type Safety
const doubao = {
id: process.env.DOUBAO_SEED_1_6!,
provider: 'volcengine' as const, // Critical: literal type
baseURL: process.env.ARK_BASE_URL!,
apiKey: process.env.ARK_API_KEY!,
};
// 3. Operator Setup
const operator = new AIOHybridOperator({
baseURL: process.env.SANDBOX_URL!,
timeout: 10000, // 10 second timeout
});
// 4. Agent Initialization
const guiAgent = new GUIAgent({
operator,
model: doubao,
systemPrompt: SYSTEM_PROMPT,
});
// 5. Task Execution
const response = await guiAgent.run({
input: [{ type: 'text', text: 'your task description' }],
});
// 6. Response Processing
console.log('Agent Response:', response.content);
'text' for text inputs[{ type: 'text', text: '打开百度搜索页面并搜索TypeScript教程' }]npm run build is mandatoryas const for string literals requiring specific types: Prevents type widening"type": "module" to package.json<point>x y</point> format is strictly required\', \", \n for proper parsingError: Cannot find module './constants'
npm run build and verify tsconfig.json settingsError: Cannot assign string to ModelProviderName
as const assertion on provider fieldas const to provider: provider: 'volcengine' as constError: ARK_API_KEY is not defined
Error: Connection refused to SANDBOX_URL
Error: Model endpoint not found
Error: Invalid action format
<point>x y</point> format and action space definitionsError: Timeout waiting for response
npm run dev for development (skips build)npm run clean