Automation
Computer Use
Coordinate-based browser actions cover screenshots, mouse input, keyboard input, scrolling, dragging, and waits.
Computer Use
Computer use is the coordinate-based surface. Use it when DOM targeting is not enough, when you need a screenshot, or when you want to drive the page the way a visual agent would.
SDK surface
await opensteer.computerExecute({ action: { type: "screenshot" } });
await opensteer.computerExecute({ action: { type: "click", x: 100, y: 200 } });
await opensteer.computerExecute({ action: { type: "type", text: "hello world" } });
await opensteer.computerExecute({
action: { type: "key", key: "Enter", modifiers: ["Shift"] },
});
await opensteer.computerExecute({
action: { type: "scroll", x: 500, y: 400, deltaX: 0, deltaY: 300 },
});
await opensteer.computerExecute({ action: { type: "move", x: 500, y: 100 } });
await opensteer.computerExecute({
action: {
type: "drag",
start: { x: 100, y: 100 },
end: { x: 300, y: 300 },
},
});
await opensteer.computerExecute({ action: { type: "wait", durationMs: 1000 } });
CLI commands
opensteer computer click 100 200 --workspace demo
opensteer computer type "hello world" --workspace demo
opensteer computer key Enter --workspace demo
opensteer computer scroll 500 400 --dx 0 --dy 300 --workspace demo
opensteer computer move 500 100 --workspace demo
opensteer computer drag 100 100 300 300 --workspace demo
opensteer computer screenshot --workspace demo
opensteer computer screenshot --workspace demo --format jpeg
opensteer computer wait 1000 --workspace demo
Action types
| Type | Purpose |
|---|---|
screenshot | Capture the current viewport |
click | Click at x, y |
type | Type text at the current focus |
key | Press a key, optionally with modifiers |
scroll | Scroll from a coordinate with deltaX and deltaY |
move | Move the mouse |
drag | Drag from a start point to an end point |
wait | Sleep for a fixed duration |
Network capture
Most computer actions can also capture network traffic:
opensteer computer click 100 200 --workspace demo --capture-network checkout
Use that when the visual action triggers an API call you want to inspect later.
When to use it
Use computer actions when:
- you need a screenshot for a visual agent loop
- the page is hard to model with DOM targets alone
- you need coordinate-level control
Use the high-level DOM actions first when you can. They are usually easier to read and more stable.
