Bring Your Own Agent (BYOA)

Looking for a typed client? The Desktop SDKs wrap this HTTP API for Python, JavaScript, and Go — no manual request wiring needed.

Overview

With BYOA, you own the agent loop. Nen exposes the desktop’s tools as a standard HTTP API that any agent framework can call — OpenAI, Anthropic, LangChain, CrewAI, or plain HTTP. You get two endpoints on each desktop:

GET /desktops/{desktop_id}/tools — discover available tools as JSON Schema
POST /desktops/{desktop_id}/execute — execute a tool and get a screenshot back

Your agent runs anywhere — your laptop, your cloud, your customer’s infra. Nen is just the execution backend.

BYOA vs Nen Agent

	BYOA	Nen Agent
Agent loop	Yours	Nen’s
Language	Any (HTTP API)	Python
Agent environment	Managed by you	Managed by Nen
Best for	You want computer use to be tightly coupled with your agentic loop	You want to define high level tasks and built-in observability

Use BYOA when you want full control over the agent loop and LLM choice. Use Nen Agent when you want Nen to handle orchestration for you.

Authentication

Get your API key from the Nen Dashboard
Pass it as the Authorization: Bearer header on every request

The same API key works for both the desktop API and the managed-workflows API — only the header name differs between them (see the API reference).

Quick Start

A complete agent loop using Claude’s tool calling API:

uv init && uv add httpx anthropic

import httpx
import anthropic

DESKTOP_ID = "dsk_abc123def456"
BASE_URL = f"https://desktop.api.getnen.ai/desktops/{DESKTOP_ID}"
NEN_API_KEY = "your_nen_api_key"
ANTHROPIC_API_KEY = "your_anthropic_api_key"
headers = {"Authorization": f"Bearer {NEN_API_KEY}"}

# 1. Discover tools and convert to Anthropic format
tools = httpx.get(f"{BASE_URL}/tools", headers=headers).json()
anthropic_tools = [
    {"name": t["name"], "description": t["description"], "input_schema": t["parameters"]}
    for t in tools
]

# 2. Take initial screenshot
initial = httpx.post(f"{BASE_URL}/execute", json={
    "action": {"tool": "computer", "action": "screenshot", "params": {}}
}, headers=headers).json()

# 3. Run the agent loop
llm = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
messages = [{"role": "user", "content": [
    {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": initial["base64_image"]}},
    {"type": "text", "text": "Open Firefox and navigate to google.com"},
]}]

for step in range(10):
    response = llm.messages.create(
        model="claude-sonnet-4-6", max_tokens=1024,
        tools=anthropic_tools, messages=messages,
    )

    # Check if the model wants to use a tool
    tool_calls = [b for b in response.content if b.type == "tool_use"]
    if not tool_calls:
        print("Agent finished:", [b.text for b in response.content if b.type == "text"])
        break

    # Execute each tool call and feed results back
    messages.append({"role": "assistant", "content": response.content})
    tool_results = []
    for tc in tool_calls:
        # Anthropic gives us {name, input}; the desktop API expects
        # {action: {tool, action, params}}.
        params = {k: v for k, v in tc.input.items() if k != "action"}
        action = {"tool": tc.name, "action": tc.input["action"], "params": params}
        result = httpx.post(f"{BASE_URL}/execute", json={"action": action},
                            headers=headers, timeout=30).json()

        content = []
        if result.get("output"):
            content.append({"type": "text", "text": result["output"]})
        if result.get("base64_image"):
            content.append({"type": "image", "source": {
                "type": "base64", "media_type": "image/png", "data": result["base64_image"]
            }})
        tool_results.append({"type": "tool_result", "tool_use_id": tc.id, "content": content or [{"type": "text", "text": "Done"}]})

    messages.append({"role": "user", "content": tool_results})

Tool Discovery

GET /desktops/{desktop_id}/tools returns tool definitions in JSON Schema format, compatible with all major LLM providers:

[
  {
    "name": "computer",
    "description": "Control the computer's mouse, keyboard, and screen.",
    "parameters": {
      "type": "object",
      "properties": {
        "action": {
          "type": "string",
          "enum": ["screenshot", "key", "type", "mouse_move", "left_click", "right_click", "double_click", "scroll", "wait", "hold_key", "cursor_position"]
        },
        "text": { "type": "string", "description": "Text to type or key combo (e.g. 'Return', 'ctrl+a')" },
        "coordinate": { "type": "array", "items": { "type": "integer" }, "description": "[x, y] screen coordinates" },
        "scroll_direction": { "type": "string", "enum": ["up", "down", "left", "right"] },
        "scroll_amount": { "type": "integer" },
        "duration": { "type": "number", "description": "Duration in seconds for wait or hold_key" }
      },
      "required": ["action"]
    }
  }
]

Pass these directly to your LLM’s tool/function calling API.

Tool Execution

POST /desktops/{desktop_id}/execute runs a tool and returns the result plus a post-action screenshot: Request:

{
  "action": {
    "tool": "computer",
    "action": "left_click",
    "params": { "coordinate": [500, 300] }
  }
}

Response:

{
  "status": "ok",
  "output": "clicked at (500, 300)",
  "base64_image": "iVBORw0KGgo..."
}

status is always present. output, base64_image, and coordinate are included only when the underlying action produces them (e.g. screenshot always returns base64_image; a plain left_click may return just {"status": "ok"}). See Tools Overview for the full reference.

The screenshot after each action is how your agent observes the desktop. Feed it back to your VLM to decide the next step.

Getting Started

Computer-Use Desktops

Managed Workflows

Help

Changelog

Bring Your Own Agent (BYOA)

Overview

BYOA vs Nen Agent

Authentication

Quick Start

Tool Discovery

Tool Execution

Next Steps

Workflow SDK

Remote Execution

Getting Started

Computer-Use Desktops

Managed Workflows

Help

Changelog

​Overview

​BYOA vs Nen Agent

​Authentication

​Quick Start

​Tool Discovery

​Tool Execution

​Next Steps

Workflow SDK

Remote Execution

Overview

BYOA vs Nen Agent

Authentication

Quick Start

Tool Discovery

Tool Execution

Next Steps