Agent Runtime
Status: Connected
Parsing documentation...
Extracted api references
Generating type definitions...
opensteeropensteer
Y Combinator logoBacked by Y Combinator.

The most comprehensive browser automation framework for AI

Enterprise-grade automation at scale.

Custom plans, unlimited concurrent sessions, dedicated proxies, and advanced support for teams that need more.

+ self-service sso
+ unlimited concurrent sessions
+ custom audit logs
+ advanced captcha bypass
+ dedicated support
view pricing

Simple, transparent pricing.

Start for free, upgrade when you need to scale. Everything you need to build robust AI agents.

Testing & Reliability

Live Web Validation Suite

Understand and run the live-web reliability suite used to validate framework behavior.

This suite validates description-driven LLM resolution and extraction against live public websites with diverse structures (navigation-heavy docs, iframes, and shadow DOM).

Why this suite exists

  • Validate real internet page behavior, not only local fixtures.
  • Confirm deterministic pass/fail outcomes for completed tasks.
  • Optionally add a lightweight LLM judge for semantic review.

Run locally

RUN_LIVE_WEB=1 pnpm run test:live-web

Use the same command with npm run or bun run if preferred.

Environment variables

  • RUN_LIVE_WEB: Set to 1 to enable this suite.
  • LIVE_WEB_MODEL: Resolver/extractor model. Default: gpt-5.1.
  • LIVE_WEB_SCENARIOS: Optional comma-separated scenario ids.
  • LIVE_WEB_JUDGE: 1 (default) enables judge, 0 disables judge.
  • LIVE_WEB_JUDGE_MODE: advisory (default) or strict.
  • LIVE_WEB_JUDGE_MODEL: Optional override for judge model. Defaults to LIVE_WEB_MODEL.

API keys

The required key depends on the model prefix:

  • gpt-*, o1-*, o3-*, o4-* -> OPENAI_API_KEY
  • claude-* -> ANTHROPIC_API_KEY
  • gemini-* -> GOOGLE_GENERATIVE_AI_API_KEY
  • grok-* -> XAI_API_KEY
  • groq/* -> GROQ_API_KEY

Example commands

Run one scenario:

RUN_LIVE_WEB=1 \
LIVE_WEB_SCENARIOS=wikipedia-search \
LIVE_WEB_MODEL=gpt-5.1 \
pnpm run test:live-web

Strict judge mode:

RUN_LIVE_WEB=1 \
LIVE_WEB_JUDGE_MODE=strict \
pnpm run test:live-web

Notes

  • This suite is intentionally manual/explicit and is not part of the default test run.
  • Deterministic checks are the primary pass criteria.
  • In advisory mode, judge failures do not fail the test.

Covered Features

  • cli:extract
  • sdk:click
  • sdk:snapshot