qa-agent-testing

Installation
SKILL.md

QA Agent Testing (Jan 2026)

Design and run reliable evaluation suites for LLM agents/personas, including tool-using and multi-agent systems.

Default QA Workflow

  1. Define the Persona Under Test (PUT): scope, out-of-scope, and safety boundaries.
  2. Define 10 representative tasks (Must Ace).
  3. Define 5 refusal edge cases (Must Decline + redirect).
  4. Define an output contract (format, tone, structure, citations).
  5. Run the suite with determinism controls and tool tracing.
  6. Score with the 6-dimension rubric; track variance across reruns.
  7. Log baselines and regressions; gate merges/deploys on thresholds.

Use the copy-paste templates in assets/ for day-0 setup.

Determinism and Flake Control

  • Control inputs: pin prompts/config, fixtures, stable tool responses, frozen time/timezone where possible.
Related skills
Installs
115
GitHub Stars
60
First Seen
Jan 23, 2026