Testing & Reliability
Live Web Validation Suite
Understand and run the live-web reliability suite used to validate framework behavior.
This suite validates description-driven LLM resolution and extraction against live public websites with diverse structures (navigation-heavy docs, iframes, and shadow DOM).
Why this suite exists
- Validate real internet page behavior, not only local fixtures.
- Confirm deterministic pass/fail outcomes for completed tasks.
- Optionally add a lightweight LLM judge for semantic review.
Run locally
RUN_LIVE_WEB=1 pnpm run test:live-web
Use the same command with npm run or bun run if preferred.
Environment variables
RUN_LIVE_WEB: Set to1to enable this suite.LIVE_WEB_MODEL: Resolver/extractor model. Default:gpt-5.1.LIVE_WEB_SCENARIOS: Optional comma-separated scenario ids.LIVE_WEB_JUDGE:1(default) enables judge,0disables judge.LIVE_WEB_JUDGE_MODE:advisory(default) orstrict.LIVE_WEB_JUDGE_MODEL: Optional override for judge model. Defaults toLIVE_WEB_MODEL.
API keys
The required key depends on the model prefix:
gpt-*,o1-*,o3-*,o4-*->OPENAI_API_KEYclaude-*->ANTHROPIC_API_KEYgemini-*->GOOGLE_GENERATIVE_AI_API_KEYgrok-*->XAI_API_KEYgroq/*->GROQ_API_KEY
Example commands
Run one scenario:
RUN_LIVE_WEB=1 \
LIVE_WEB_SCENARIOS=wikipedia-search \
LIVE_WEB_MODEL=gpt-5.1 \
pnpm run test:live-web
Strict judge mode:
RUN_LIVE_WEB=1 \
LIVE_WEB_JUDGE_MODE=strict \
pnpm run test:live-web
Notes
- This suite is intentionally manual/explicit and is not part of the default
testrun. - Deterministic checks are the primary pass criteria.
- In
advisorymode, judge failures do not fail the test.
Covered Features
cli:extractsdk:clicksdk:snapshot
