Testing & Reliability

Live Web Validation Suite

Understand and run the live-web reliability suite used to validate framework behavior.

This suite validates description-driven LLM resolution and extraction against live public websites with diverse structures (navigation-heavy docs, iframes, and shadow DOM).

Why this suite exists

Validate real internet page behavior, not only local fixtures.
Confirm deterministic pass/fail outcomes for completed tasks.
Optionally add a lightweight LLM judge for semantic review.

Run locally

RUN_LIVE_WEB=1 pnpm run test:live-web

Use the same command with npm run or bun run if preferred.

Environment variables

RUN_LIVE_WEB: Set to 1 to enable this suite.
LIVE_WEB_MODEL: Resolver/extractor model. Default: gpt-5.1.
LIVE_WEB_SCENARIOS: Optional comma-separated scenario ids.
LIVE_WEB_JUDGE: 1 (default) enables judge, 0 disables judge.
LIVE_WEB_JUDGE_MODE: advisory (default) or strict.
LIVE_WEB_JUDGE_MODEL: Optional override for judge model. Defaults to LIVE_WEB_MODEL.

API keys

The required key depends on the model prefix:

gpt-*, o1-*, o3-*, o4-* -> OPENAI_API_KEY
claude-* -> ANTHROPIC_API_KEY
gemini-* -> GOOGLE_GENERATIVE_AI_API_KEY
grok-* -> XAI_API_KEY
groq/* -> GROQ_API_KEY

Example commands

Run one scenario:

RUN_LIVE_WEB=1 \
LIVE_WEB_SCENARIOS=wikipedia-search \
LIVE_WEB_MODEL=gpt-5.1 \
pnpm run test:live-web

Strict judge mode:

RUN_LIVE_WEB=1 \
LIVE_WEB_JUDGE_MODE=strict \
pnpm run test:live-web

Notes

This suite is intentionally manual/explicit and is not part of the default test run.
Deterministic checks are the primary pass criteria.
In advisory mode, judge failures do not fail the test.

Covered Features

cli:extract
sdk:click
sdk:snapshot

The most comprehensive browser automation framework for AI

Enterprise-grade automation at scale.

Simple, transparent pricing.

Live Web Validation Suite

Why this suite exists

Run locally

Environment variables

API keys

Example commands

Notes

Covered Features

The most comprehensive browser automation framework for AI

Enterprise-grade automation at scale.

Simple, transparent pricing.

Why this suite exists

Run locally

Environment variables

API keys

Example commands

Notes

Covered Features

Related Docs