Agent
VLM-powered controller for desktop automation.
Import
Constructor
Agent(model: str | None = None)
| Parameter | Type | Default | Description |
|---|
model | str | None | None | Default VLM model for all calls |
Methods
execute()
agent.execute(instruction: str, max_iterations: int | None = None, model: str | None = None) -> dict
Perform an action on the virtual desktop via natural language.
Parameters:
| Name | Type | Default | Description |
|---|
instruction | str | — | What to do |
max_iterations | int | None | None | Max screenshot → think → act loops. Server defaults to 10 when not set |
model | str | None | None | Override default model |
Returns: dict — execution result metadata
Raises: WorkflowError
agent.execute("Click the Submit button", max_iterations=5)
agent.execute("Fill the signup form with all required fields", max_iterations=20)
agent.execute("Navigate to Settings", model="claude-haiku-4-5-20251001")
verify()
agent.verify(condition: str, timeout: int = 10, model: str | None = None) -> bool
Check whether a condition is visible on screen.
Parameters:
| Name | Type | Default | Description |
|---|
condition | str | — | Expected screen state |
timeout | int | 10 | Seconds to wait |
model | str | None | None | Override default model |
Returns: bool — True if condition met within timeout
agent.verify("Is the user logged in?")
agent.verify("Has the page loaded?", timeout=30)
agent.extract(query: str, schema: dict, model: str | None = None) -> dict | list
Extract structured data from the current screen.
Parameters:
| Name | Type | Default | Description |
|---|
query | str | — | What data to extract |
schema | dict | — | JSON Schema for output format |
model | str | None | None | Override default model |
Returns: dict | list — structured data matching schema
Raises: WorkflowError, ValueError (if schema is empty)
data = agent.extract("Extract the order total", {"type": "object", "properties": {"total": {"type": "number"}}, "required": ["total"]})
Use YourModel.model_json_schema() to generate schemas from Pydantic models.