Skip to main content
agent.execute(instruction: str, max_iterations: int | None = None, model: str | None = None) -> dict
The agent takes a screenshot, interprets the instruction, and performs the required interactions (clicks, typing, scrolling, etc.).

Parameters

ParameterTypeDefaultDescription
instructionstrNatural language description of what to do
max_iterationsint | NoneNoneMaximum number of screenshot → think → act loops. Defaults to 10 on the server when not set
modelstr | NoneNoneOverride the default model for this call

Returns

dict — Execution result metadata.

Raises

WorkflowError

Examples

def run(params: Params) -> Result:
    agent = Agent()
    agent.execute("Open the browser to https://example.com")
def run(params: Params) -> Result:
    # Simple click — cap at 5 to fail fast if the button isn't visible
    agent = Agent()
    agent.execute("Click the Submit button", max_iterations=5)
def run(params: Params) -> Result:
    # Multi-step form fill — needs more iterations to complete all fields
    agent = Agent()
    agent.execute(
        "Fill in the registration form with name 'Jane Doe', email 'jane@example.com', and select 'Premium' plan",
        max_iterations=20,
    )
def run(params: Params) -> Result:
    # Use a faster model for simple actions
    agent = Agent()
    agent.execute("Quit the calendar app", model="claude-haiku-4-5-20251001")

Understanding max_iterations

Each execute() call runs in a screenshot → think → act loop. One iteration is one cycle of:
  1. Take a screenshot of the current screen
  2. Send it to the VLM with the instruction
  3. VLM decides and performs an action (click, type, scroll, etc.)
max_iterations caps how many of these loops can run before the agent stops.
ScenarioRecommended valueWhy
Simple one-click action35Fail fast if the element isn’t visible
Multi-step form or navigation1520Enough room to complete all steps
General-purpose (default)omit (None)Server uses 10, which works for most tasks
Write instructions as if you’re telling a human what to do. Be specific about which element to interact with — “Click the blue Submit button in the form” is better than “Click submit”.