Skip to main content

Overview

The computer tool provides full control over the desktop — taking screenshots, clicking, typing, scrolling, and keyboard shortcuts. It’s the primary tool available on every cloud desktop.

Actions

screenshot

Capture the current screen state.
{"name": "computer", "arguments": {"action": "screenshot"}}
Returns a base64-encoded PNG in the base64_image field.

left_click

Click at specific screen coordinates.
{"name": "computer", "arguments": {"action": "left_click", "coordinate": [500, 300]}}

right_click

Right-click at specific screen coordinates.
{"name": "computer", "arguments": {"action": "right_click", "coordinate": [500, 300]}}

double_click

Double-click at specific screen coordinates.
{"name": "computer", "arguments": {"action": "double_click", "coordinate": [500, 300]}}

middle_click

Middle-click (mouse-wheel-button click) at specific screen coordinates.
{"name": "computer", "arguments": {"action": "middle_click", "coordinate": [500, 300]}}

triple_click

Triple-click at specific screen coordinates.
{"name": "computer", "arguments": {"action": "triple_click", "coordinate": [500, 300]}}

left_click_drag

Press the left mouse button at start_coordinate, move to coordinate, and release. Used for drag-and-drop and selection.
{"name": "computer", "arguments": {"action": "left_click_drag", "start_coordinate": [400, 300], "coordinate": [600, 500]}}
ParameterTypeDescription
start_coordinate[x, y]Where to press the mouse button
coordinate[x, y]Where to release the mouse button

left_mouse_down

Press and hold the left mouse button at the given coordinates without releasing. Pair with left_mouse_up to perform low-level drag sequences.
{"name": "computer", "arguments": {"action": "left_mouse_down", "coordinate": [500, 300]}}

left_mouse_up

Release the left mouse button at the given coordinates. Pairs with left_mouse_down.
{"name": "computer", "arguments": {"action": "left_mouse_up", "coordinate": [500, 300]}}

mouse_move

Move the cursor without clicking.
{"name": "computer", "arguments": {"action": "mouse_move", "coordinate": [500, 300]}}

type

Type a string of text at the current cursor position.
{"name": "computer", "arguments": {"action": "type", "text": "Hello, world!"}}

key

Press a key or key combination.
{"name": "computer", "arguments": {"action": "key", "text": "Return"}}
Common key names: Return, Tab, Escape, Backspace, Delete, space Key combinations use +:
{"name": "computer", "arguments": {"action": "key", "text": "ctrl+a"}}
{"name": "computer", "arguments": {"action": "key", "text": "ctrl+c"}}
{"name": "computer", "arguments": {"action": "key", "text": "alt+F4"}}

scroll

Scroll in a direction.
{"name": "computer", "arguments": {"action": "scroll", "coordinate": [500, 300], "scroll_direction": "down", "scroll_amount": 3}}
ParameterTypeDescription
coordinate[x, y]Where to scroll
scroll_directionstringup, down, left, right
scroll_amountintegerNumber of scroll increments

wait

Pause execution. Useful for waiting for loading screens or animations to complete.
{"name": "computer", "arguments": {"action": "wait"}}
Pass an optional duration (seconds) to control how long to wait:
{"name": "computer", "arguments": {"action": "wait", "duration": 2.5}}

hold_key

Hold a key down for a specified duration. Useful for key-hold interactions.
{"name": "computer", "arguments": {"action": "hold_key", "text": "shift", "duration": 1.0}}
ParameterTypeDescription
textstringKey to hold (e.g. "shift", "ctrl")
durationfloatHow long to hold the key, in seconds

cursor_position

Get the current cursor position.
{"name": "computer", "arguments": {"action": "cursor_position"}}

Response Format

Every action returns:
{
  "status": "ok",
  "output": "clicked at (500, 300)",
  "base64_image": "iVBORw0KGgo..."
}
FieldTypeDescription
statusstringAlways "ok" on success.
outputstringOptional. Short text description of what happened. Only present when the action produced text output.
base64_imagestringOptional. Base64-encoded PNG of the screen after the action.
coordinate[int, int]Optional. Final cursor position. Only present for cursor_position.

Next Steps

Quickstart

See these tools in action in a complete agent loop

Examples

Full examples for each LLM provider