Overview
Thecomputer tool provides full control over the desktop — taking screenshots, clicking, typing, scrolling, and keyboard shortcuts. It’s the primary tool available on every cloud desktop.
Actions
screenshot
Capture the current screen state.screenshot field.
left_click
Click at specific screen coordinates.right_click
Right-click at specific screen coordinates.double_click
Double-click at specific screen coordinates.mouse_move
Move the cursor without clicking.type
Type a string of text at the current cursor position.key
Press a key or key combination.Return, Tab, Escape, Backspace, Delete, space
Key combinations use +:
scroll
Scroll in a direction.| Parameter | Type | Description |
|---|---|---|
coordinate | [x, y] | Where to scroll |
scroll_direction | string | up, down, left, right |
scroll_amount | integer | Number of scroll increments |
wait
Wait for a specified duration (useful for loading screens).cursor_position
Get the current cursor position.Response Format
Every action returns:| Field | Type | Description |
|---|---|---|
success | boolean | Whether the action succeeded |
output | string | Human-readable description of what happened |
screenshot | string | Base64-encoded PNG of the screen after the action |
error | string | Error message (only present on failure) |
Next Steps
Quickstart
See these tools in action in a complete agent loop
Examples
Full examples for each LLM provider