Automation

Computer Use

Coordinate-based browser actions cover screenshots, mouse input, keyboard input, scrolling, dragging, and waits.

Computer Use

Computer use is the coordinate-based surface. Use it when DOM targeting is not enough, when you need a screenshot, or when you want to drive the page the way a visual agent would.

SDK surface

await opensteer.computerExecute({ action: { type: "screenshot" } });
await opensteer.computerExecute({ action: { type: "click", x: 100, y: 200 } });
await opensteer.computerExecute({ action: { type: "type", text: "hello world" } });
await opensteer.computerExecute({
  action: { type: "key", key: "Enter", modifiers: ["Shift"] },
});
await opensteer.computerExecute({
  action: { type: "scroll", x: 500, y: 400, deltaX: 0, deltaY: 300 },
});
await opensteer.computerExecute({ action: { type: "move", x: 500, y: 100 } });
await opensteer.computerExecute({
  action: {
    type: "drag",
    start: { x: 100, y: 100 },
    end: { x: 300, y: 300 },
  },
});
await opensteer.computerExecute({ action: { type: "wait", durationMs: 1000 } });

CLI commands

opensteer computer click 100 200 --workspace demo
opensteer computer type "hello world" --workspace demo
opensteer computer key Enter --workspace demo
opensteer computer scroll 500 400 --dx 0 --dy 300 --workspace demo
opensteer computer move 500 100 --workspace demo
opensteer computer drag 100 100 300 300 --workspace demo
opensteer computer screenshot --workspace demo
opensteer computer screenshot --workspace demo --format jpeg
opensteer computer wait 1000 --workspace demo

Action types

TypePurpose
screenshotCapture the current viewport
clickClick at x, y
typeType text at the current focus
keyPress a key, optionally with modifiers
scrollScroll from a coordinate with deltaX and deltaY
moveMove the mouse
dragDrag from a start point to an end point
waitSleep for a fixed duration

Network capture

Most computer actions can also capture network traffic:

opensteer computer click 100 200 --workspace demo --capture-network checkout

Use that when the visual action triggers an API call you want to inspect later.

When to use it

Use computer actions when:

  • you need a screenshot for a visual agent loop
  • the page is hard to model with DOM targets alone
  • you need coordinate-level control

Use the high-level DOM actions first when you can. They are usually easier to read and more stable.