apps/docs/src/content/docs/en/java-sdk/computer-use.mdx
Desktop automation operations for a Sandbox.
Provides a Java facade for computer-use features including desktop session management, screenshots, mouse and keyboard automation, display/window inspection, and screen recording.
public ComputerUseStartResponse start()
Starts the computer-use desktop stack (VNC/noVNC and related processes).
Returns:
ComputerUseStartResponse - start response containing process status detailspublic ComputerUseStopResponse stop()
Stops all computer-use desktop processes.
Returns:
ComputerUseStopResponse - stop response containing process status detailspublic ComputerUseStatusResponse getStatus()
Returns current computer-use status.
Returns:
ComputerUseStatusResponse - overall computer-use statuspublic ScreenshotResponse takeScreenshot()
Captures a full-screen screenshot without cursor.
Returns:
ScreenshotResponse - screenshot payload (base64 image and metadata)public ScreenshotResponse takeScreenshot(boolean showCursor)
Captures a full-screen screenshot.
Parameters:
showCursor boolean - whether to render cursor in the screenshotReturns:
ScreenshotResponse - screenshot payload (base64 image and metadata)public ScreenshotResponse takeRegionScreenshot(int x, int y, int width, int height)
Captures a screenshot of a rectangular region without cursor.
Parameters:
x int - region top-left X coordinatey int - region top-left Y coordinatewidth int - region width in pixelsheight int - region height in pixelsReturns:
ScreenshotResponse - region screenshot payloadpublic ScreenshotResponse takeCompressedScreenshot(String format, int quality, double scale)
Captures a compressed full-screen screenshot.
Parameters:
format String - output image format (for example: png, jpeg, webp)quality int - compression quality (typically 1-100, format dependent)scale double - screenshot scale factor (for example: 0.5 for 50%)Returns:
ScreenshotResponse - compressed screenshot payloadpublic MouseClickResponse click(int x, int y)
Performs a left mouse click at the given coordinates.
Parameters:
x int - target X coordinatey int - target Y coordinateReturns:
MouseClickResponse - click response with resulting cursor positionpublic MouseClickResponse click(int x, int y, String button)
Performs a mouse click at the given coordinates with a specific button.
Parameters:
x int - target X coordinatey int - target Y coordinatebutton String - button type (left, right, middle)Returns:
MouseClickResponse - click response with resulting cursor positionpublic MouseClickResponse doubleClick(int x, int y)
Performs a double left-click at the given coordinates.
Parameters:
x int - target X coordinatey int - target Y coordinateReturns:
MouseClickResponse - click response with resulting cursor positionpublic MousePositionResponse moveMouse(int x, int y)
Moves the mouse cursor to the given coordinates.
Parameters:
x int - target X coordinatey int - target Y coordinateReturns:
MousePositionResponse - new mouse positionpublic MousePositionResponse getMousePosition()
Returns current mouse position.
Returns:
MousePositionResponse - current mouse cursor coordinatespublic MouseDragResponse drag(int startX, int startY, int endX, int endY)
Drags the mouse from one point to another using the left button.
Parameters:
startX int - drag start X coordinatestartY int - drag start Y coordinateendX int - drag end X coordinateendY int - drag end Y coordinateReturns:
MouseDragResponse - drag response with resulting cursor positionpublic ScrollResponse scroll(int x, int y, int deltaX, int deltaY)
Scrolls at the given coordinates.
The current toolbox API supports directional scrolling (up/down) with an
amount. This method maps deltaY to vertical scroll direction and magnitude.
If deltaY is 0, deltaX is used as a fallback.
Parameters:
x int - anchor X coordinatey int - anchor Y coordinatedeltaX int - horizontal delta (used only when deltaY == 0)deltaY int - vertical deltaReturns:
ScrollResponse - scroll response indicating operation successpublic void typeText(String text)
Types text using keyboard automation.
Parameters:
text String - text to typepublic void pressKey(String key)
Presses a single key.
Parameters:
key String - key to press. Canonical names include enter, escape, tab, letters, digits, unshifted punctuation, function keys, and grammar-safe numpad names such as num_plus. Named keys are case-insensitive, and common aliases such as Return and Escape are normalized.public void pressHotkey(String... keys)
Presses a key combination as a hotkey sequence.
Keys are joined with + before being sent (for example,
pressHotkey("ctrl", "shift", "t") -> "ctrl+shift+t"). The resulting
value is a single atomic chord and uses the same normalized key contract as
#pressKey(String).
Parameters:
keys String... - hotkey parts to combinepublic DisplayInfoResponse getDisplayInfo()
Returns display configuration information.
Returns:
DisplayInfoResponse - display information including available displays and their geometrypublic WindowsResponse getWindows()
Returns currently open windows.
Returns:
WindowsResponse - window list and metadatapublic Recording startRecording()
Starts a recording with default options.
Returns:
Recording - newly started recording metadatapublic Recording startRecording(String label)
Starts a recording with an optional label.
Parameters:
label String - optional recording labelReturns:
Recording - newly started recording metadatapublic Recording stopRecording(String id)
Stops an active recording.
Parameters:
id String - recording identifierReturns:
Recording - finalized recording metadatapublic ListRecordingsResponse listRecordings()
Lists all recordings for the current sandbox session.
Returns:
ListRecordingsResponse - recordings list responsepublic Recording getRecording(String id)
Returns metadata for a specific recording.
Parameters:
id String - recording identifierReturns:
Recording - recording detailspublic File downloadRecording(String id)
Downloads a recording file.
Parameters:
id String - recording identifierReturns:
File - downloaded temporary/local file handle returned by the API clientpublic void deleteRecording(String id)
Deletes a recording.
Parameters:
id String - recording identifier