Overview
Thecomputer tool provides full control over the desktop — taking screenshots, clicking, typing, scrolling, and keyboard shortcuts. It’s the primary tool available on every cloud desktop.
Actions
screenshot
Capture the current screen state.base64_image field.
left_click
Click at specific screen coordinates.right_click
Right-click at specific screen coordinates.double_click
Double-click at specific screen coordinates.middle_click
Middle-click (mouse-wheel-button click) at specific screen coordinates.triple_click
Triple-click at specific screen coordinates.left_click_drag
Press the left mouse button atstart_coordinate, move to coordinate, and release. Used for drag-and-drop and selection.
| Parameter | Type | Description |
|---|---|---|
start_coordinate | [x, y] | Where to press the mouse button |
coordinate | [x, y] | Where to release the mouse button |
left_mouse_down
Press and hold the left mouse button at the given coordinates without releasing. Pair withleft_mouse_up to perform low-level drag sequences.
left_mouse_up
Release the left mouse button at the given coordinates. Pairs withleft_mouse_down.
mouse_move
Move the cursor without clicking.type
Type a string of text at the current cursor position.key
Press a key or key combination.Return, Tab, Escape, Backspace, Delete, space
Key combinations use +:
scroll
Scroll in a direction.| Parameter | Type | Description |
|---|---|---|
coordinate | [x, y] | Where to scroll |
scroll_direction | string | up, down, left, right |
scroll_amount | integer | Number of scroll increments |
wait
Pause execution. Useful for waiting for loading screens or animations to complete.duration (seconds) to control how long to wait:
hold_key
Hold a key down for a specified duration. Useful for key-hold interactions.| Parameter | Type | Description |
|---|---|---|
text | string | Key to hold (e.g. "shift", "ctrl") |
duration | float | How long to hold the key, in seconds |
cursor_position
Get the current cursor position.Response Format
Every action returns:| Field | Type | Description |
|---|---|---|
status | string | Always "ok" on success. |
output | string | Optional. Short text description of what happened. Only present when the action produced text output. |
base64_image | string | Optional. Base64-encoded PNG of the screen after the action. |
coordinate | [int, int] | Optional. Final cursor position. Only present for cursor_position. |
Next Steps
Quickstart
See these tools in action in a complete agent loop
Examples
Full examples for each LLM provider