Agent Runtime
Status: Connected
Parsing documentation...
Extracted api references
Generating type definitions...
opensteeropensteer
Y Combinator logoBacked by Y Combinator.

The most comprehensive browser automation framework for AI

Enterprise-grade automation at scale.

Custom plans, unlimited concurrent sessions, dedicated proxies, and advanced support for teams that need more.

+ self-service sso
+ unlimited concurrent sessions
+ custom audit logs
+ advanced captcha bypass
+ dedicated support
view pricing

Simple, transparent pricing.

Start for free, upgrade when you need to scale. Everything you need to build robust AI agents.

Automation

Computer Use

Coordinate-based actions and CUA agent mode

Coordinate-based viewport actions for visual automation and CUA (Computer Use Agent) workflows.

Coordinate Actions

await os.computerExecute({ action: { type: "screenshot" } });
await os.computerExecute({ action: { type: "click", x: 100, y: 200 } });
await os.computerExecute({ action: { type: "type", text: "hello" } });
await os.computerExecute({ action: { type: "scroll", x: 400, y: 300, deltaX: 0, deltaY: 300 } });
await os.computerExecute({ action: { type: "key", key: "Enter" } });
await os.computerExecute({ action: { type: "move", x: 500, y: 100 } });
await os.computerExecute({ action: { type: "drag", start: { x: 100, y: 100 }, end: { x: 300, y: 300 } } });
await os.computerExecute({ action: { type: "wait", durationMs: 1000 } });
npx opensteer computer '{"type":"screenshot"}' --workspace demo
npx opensteer computer '{"type":"click","x":100,"y":200}' --workspace demo

Action Types

TypeParametersDescription
screenshotCapture viewport screenshot
clickx, yClick at coordinates
typetextType text at current focus
scrollx, y, deltaX, deltaYScroll at position
keykeyPress keyboard key
movex, yMove mouse to position
dragstart: {x,y}, end: {x,y}Drag from start to end
waitdurationMsWait for duration

AI Agent Integration

Computer Use mode can be used by AI agents through the computerExecute() method, which provides screenshot-based navigation. Agents take screenshots, reason about what they see, and execute coordinate-based actions in a loop:

const screenshot = await os.computerExecute({ action: { type: "screenshot" } });
// Pass screenshot to your AI model for reasoning
// Execute the model's chosen action via computerExecute()
await os.computerExecute({ action: { type: "click", x: 100, y: 200 } });