Automation
Computer Use
Coordinate-based actions and CUA agent mode
Coordinate-based viewport actions for visual automation and CUA (Computer Use Agent) workflows.
Coordinate Actions
await os.computerExecute({ action: { type: "screenshot" } });
await os.computerExecute({ action: { type: "click", x: 100, y: 200 } });
await os.computerExecute({ action: { type: "type", text: "hello" } });
await os.computerExecute({ action: { type: "scroll", x: 400, y: 300, deltaX: 0, deltaY: 300 } });
await os.computerExecute({ action: { type: "key", key: "Enter" } });
await os.computerExecute({ action: { type: "move", x: 500, y: 100 } });
await os.computerExecute({ action: { type: "drag", start: { x: 100, y: 100 }, end: { x: 300, y: 300 } } });
await os.computerExecute({ action: { type: "wait", durationMs: 1000 } });
npx opensteer computer '{"type":"screenshot"}' --workspace demo
npx opensteer computer '{"type":"click","x":100,"y":200}' --workspace demo
Action Types
| Type | Parameters | Description |
|---|---|---|
screenshot | — | Capture viewport screenshot |
click | x, y | Click at coordinates |
type | text | Type text at current focus |
scroll | x, y, deltaX, deltaY | Scroll at position |
key | key | Press keyboard key |
move | x, y | Move mouse to position |
drag | start: {x,y}, end: {x,y} | Drag from start to end |
wait | durationMs | Wait for duration |
AI Agent Integration
Computer Use mode can be used by AI agents through the computerExecute() method, which provides screenshot-based navigation. Agents take screenshots, reason about what they see, and execute coordinate-based actions in a loop:
const screenshot = await os.computerExecute({ action: { type: "screenshot" } });
// Pass screenshot to your AI model for reasoning
// Execute the model's chosen action via computerExecute()
await os.computerExecute({ action: { type: "click", x: 100, y: 200 } });
