qa-agent-testing
QA Agent Testing (Jan 2026)
Design and run reliable evaluation suites for LLM agents/personas, including tool-using and multi-agent systems.
Default QA Workflow
- Define the Persona Under Test (PUT): scope, out-of-scope, and safety boundaries.
- Define 10 representative tasks (Must Ace).
- Define 5 refusal edge cases (Must Decline + redirect).
- Define an output contract (format, tone, structure, citations).
- Run the suite with determinism controls and tool tracing.
- Score with the 6-dimension rubric; track variance across reruns.
- Log baselines and regressions; gate merges/deploys on thresholds.
Use the copy-paste templates in assets/ for day-0 setup.
Determinism and Flake Control
- Control inputs: pin prompts/config, fixtures, stable tool responses, frozen time/timezone where possible.
More from vasilyu1983/ai-agents-public
product-management
Founder-PM toolkit for discovery, roadmaps, prioritization, and PMF measurement. Use when planning product strategy, metrics, or roadmaps.
684software-architecture-design
Designs system structure across monolith/microservices/serverless. Use when structuring systems, scaling, decomposing monoliths, or choosing patterns.
519software-ui-ux-design
Designs and audits UI/UX with WCAG 2.2 accessibility. Use when designing flows, running heuristic reviews, or defining design systems.
383qa-testing-playwright
E2E web testing with Playwright. Use when writing tests, debugging flakes, or setting up CI with selectors, sharding, and network mocking.
372document-pdf
Extract text/tables from PDFs, create formatted PDFs, merge/split/rotate, and handle forms. Use for any PDF generation or parsing task.
327qa-testing-strategy
Risk-based test strategy for software delivery. Use when defining coverage, setting CI gates, managing flaky tests, or establishing release criteria.
317