agent.execute(instruction: str, max_iterations: int | None = None, model: str | None = None) -> dict
The agent takes a screenshot, interprets the instruction, and performs the required interactions (clicks, typing, scrolling, etc.).
Parameters
| Parameter | Type | Default | Description |
|---|
instruction | str | — | Natural language description of what to do |
max_iterations | int | None | None | Maximum number of screenshot → think → act loops. Defaults to 10 on the server when not set |
model | str | None | None | Override the default model for this call |
Returns
dict — Execution result metadata.
Raises
WorkflowError
Examples
def run(params: Params) -> Result:
agent = Agent()
agent.execute("Open the browser to https://example.com")
def run(params: Params) -> Result:
# Simple click — cap at 5 to fail fast if the button isn't visible
agent = Agent()
agent.execute("Click the Submit button", max_iterations=5)
def run(params: Params) -> Result:
# Multi-step form fill — needs more iterations to complete all fields
agent = Agent()
agent.execute(
"Fill in the registration form with name 'Jane Doe', email 'jane@example.com', and select 'Premium' plan",
max_iterations=20,
)
def run(params: Params) -> Result:
# Use a faster model for simple actions
agent = Agent()
agent.execute("Quit the calendar app", model="claude-haiku-4-5-20251001")
Understanding max_iterations
Each execute() call runs in a screenshot → think → act loop. One iteration is one cycle of:
- Take a screenshot of the current screen
- Send it to the VLM with the instruction
- VLM decides and performs an action (click, type, scroll, etc.)
max_iterations caps how many of these loops can run before the agent stops.
| Scenario | Recommended value | Why |
|---|
| Simple one-click action | 3–5 | Fail fast if the element isn’t visible |
| Multi-step form or navigation | 15–20 | Enough room to complete all steps |
| General-purpose (default) | omit (None) | Server uses 10, which works for most tasks |
Write instructions as if you’re telling a human what to do. Be specific about which element to interact with — “Click the blue Submit button in the form” is better than “Click submit”.