multimodal/gui-agent/operator-nutjs/README.md
NutJS Operator is a computer operator based on NutJS for GUI Agent. It provides a set of APIs to interact with the desktop environment, including taking screenshots, mouse operations, keyboard operations, and more.
npm install @gui-agent/operator-nutjs
Or with yarn:
yarn add @gui-agent/operator-nutjs
Or with pnpm:
pnpm add @gui-agent/operator-nutjs
import { NutJSOperator } from '@gui-agent/operator-nutjs';
import { ConsoleLogger, LogLevel } from '@agent-infra/logger';
// Create a logger
const logger = new ConsoleLogger(undefined, LogLevel.DEBUG);
// Create an operator instance
const operator = new NutJSOperator(logger);
// Take a screenshot
const screenshot = await operator.screenshot();
console.log('Screenshot taken:', screenshot.status);
// Execute actions
const result = await operator.execute({
actions: [
{
type: 'click',
inputs: {
point: {
normalized: { x: 0.5, y: 0.5 } // Click at the center of the screen
}
}
},
{
type: 'type',
inputs: {
content: 'Hello, World!'
}
}
]
});
NutJSOperatorThe main class that provides methods to interact with the desktop environment.
constructor(logger: ConsoleLogger = defaultLogger)
logger: A ConsoleLogger instance for logging. Default is a ConsoleLogger with LogLevel.DEBUG.screenshot(): Promise<ScreenshotOutput>Takes a screenshot of the screen.
ScreenshotOutput object containing:
base64: The base64-encoded image datacontentType: The content type of the image (e.g., 'image/jpeg')status: The status of the operation ('success' or 'error')execute(params: ExecuteParams): Promise<ExecuteOutput>Executes a list of actions.
params: An object containing:
actions: An array of action objectsExecuteOutput object containing:
status: The status of the operation ('success' or 'error')move, move_to, mouse_move, hover: Move the mouse to a specified positionclick, left_click, left_single: Perform a left mouse clickleft_double, double_click: Perform a double left mouse clickright_click, right_single: Perform a right mouse clickmiddle_click: Perform a middle mouse clickleft_click_drag, drag, select: Drag the mouse from one position to anothertype: Type texthotkey: Press a hotkey combinationpress: Press a keyrelease: Release a keyscroll: Scroll up or downwait: Wait for a specified timefinished: Do nothing (used to indicate the end of actions)Apache-2.0