multimodal/gui-agent/cli/README.md
CLI for GUI Agent - A powerful automation tool for desktop, web, and mobile applications.
npm install -g @gui-agent/cli
npx @gui-agent/cli run [options]
npm install @gui-agent/cli
gui-agent run
This will start an interactive prompt where you can:
gui-agent runRun GUI Agent automation with optional parameters.
gui-agent resetReset stored configuration (API keys, model settings, etc.).
gui-agent reset # Reset default configuration file
gui-agent reset -c custom.json # Reset specific configuration file
gui-agent run [options]
-p, --presets <url> - Load model configuration from a remote YAML preset file-t, --target <target> - Specify the target operator:
computer - Desktop automation (default)browser - Web browser automationandroid - Android mobile automation-q, --query <query> - Provide the automation instruction directly via command line-c, --config <path> - Path to a custom configuration file (default: ~/.gui-agent-cli.json)gui-agent run -t computer -q "Open Chrome browser and navigate to github.com"
Make sure your Android device is connected via USB debugging:
gui-agent run -t android -q "Open WhatsApp and send a message to John"
gui-agent run -t browser -q "Search for 'GUI Agent automation' on Google"
gui-agent run -p "https://example.com/config.yaml" -q "Automate the login process"
The CLI requires VLM (Vision Language Model) configuration. You can provide this via:
Interactive setup - When you first run the CLI, it will prompt for:
Configuration file - Settings are saved to ~/.gui-agent-cli.json:
{
"provider": "openai",
"baseURL": "https://api.openai.com/v1",
"apiKey": "your-api-key",
"model": "gpt-4-vision-preview",
"useResponsesApi": false
}
Remote presets - Load configuration from a YAML file:
vlmBaseUrl: "https://api.openai.com/v1"
vlmApiKey: "your-api-key"
vlmModelName: "gpt-4-vision-preview"
useResponsesApi: false
gui-agent start -p "https://example.com/config.yaml" -q "Automate the login process"
The CLI requires VLM (Vision Language Model) configuration. You can provide this via:
Interactive setup - When you first run the CLI, it will prompt for:
Configuration file - Settings are saved to ~/.gui-agent-cli.json:
{
"provider": "openai",
"baseURL": "https://api.openai.com/v1",
"apiKey": "your-api-key",
"model": "gpt-4-vision-preview",
"useResponsesApi": false
}
Remote presets - Load configuration from a YAML file:
vlmBaseUrl: "https://api.openai.com/v1"
vlmApiKey: "your-api-key"
vlmModelName: "gpt-4-vision-preview"
useResponsesApi: false
To clear all stored configuration and start fresh:
gui-agent reset
This will remove the configuration file (~/.gui-agent-cli.json) and the CLI will prompt you to configure settings again on the next run.
You can specify a custom configuration file location:
gui-agent run -c /path/to/custom-config.json
To reset a specific configuration file:
gui-agent reset -c /path/to/custom-config.json
npm run build
npm run dev
npm test
Apache-2.0
Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.