Skip to main content

Overview

The computer tool provides full control over the desktop — taking screenshots, clicking, typing, scrolling, and keyboard shortcuts. It’s the primary tool available on every cloud desktop.

Actions

screenshot

Capture the current screen state.
{"name": "computer", "arguments": {"action": "screenshot"}}
Returns a base64-encoded PNG in the screenshot field.

left_click

Click at specific screen coordinates.
{"name": "computer", "arguments": {"action": "left_click", "coordinate": [500, 300]}}

right_click

Right-click at specific screen coordinates.
{"name": "computer", "arguments": {"action": "right_click", "coordinate": [500, 300]}}

double_click

Double-click at specific screen coordinates.
{"name": "computer", "arguments": {"action": "double_click", "coordinate": [500, 300]}}

mouse_move

Move the cursor without clicking.
{"name": "computer", "arguments": {"action": "mouse_move", "coordinate": [500, 300]}}

type

Type a string of text at the current cursor position.
{"name": "computer", "arguments": {"action": "type", "text": "Hello, world!"}}

key

Press a key or key combination.
{"name": "computer", "arguments": {"action": "key", "text": "Return"}}
Common key names: Return, Tab, Escape, Backspace, Delete, space Key combinations use +:
{"name": "computer", "arguments": {"action": "key", "text": "ctrl+a"}}
{"name": "computer", "arguments": {"action": "key", "text": "ctrl+c"}}
{"name": "computer", "arguments": {"action": "key", "text": "alt+F4"}}

scroll

Scroll in a direction.
{"name": "computer", "arguments": {"action": "scroll", "coordinate": [500, 300], "scroll_direction": "down", "scroll_amount": 3}}
ParameterTypeDescription
coordinate[x, y]Where to scroll
scroll_directionstringup, down, left, right
scroll_amountintegerNumber of scroll increments

wait

Wait for a specified duration (useful for loading screens).
{"name": "computer", "arguments": {"action": "wait"}}

cursor_position

Get the current cursor position.
{"name": "computer", "arguments": {"action": "cursor_position"}}

Response Format

Every action returns:
{
  "success": true,
  "output": "clicked at (500, 300)",
  "screenshot": "iVBORw0KGgo..."
}
FieldTypeDescription
successbooleanWhether the action succeeded
outputstringHuman-readable description of what happened
screenshotstringBase64-encoded PNG of the screen after the action
errorstringError message (only present on failure)

Next Steps

Quickstart

See these tools in action in a complete agent loop

Examples

Full examples for each LLM provider