qa-agent-testing

Installation

SKILL.md

QA Agent Testing (Jan 2026)

Design and run reliable evaluation suites for LLM agents/personas, including tool-using and multi-agent systems.

Default QA Workflow

Define the Persona Under Test (PUT): scope, out-of-scope, and safety boundaries.
Define 10 representative tasks (Must Ace).
Define 5 refusal edge cases (Must Decline + redirect).
Define an output contract (format, tone, structure, citations).
Run the suite with determinism controls and tool tracing.
Score with the 6-dimension rubric; track variance across reruns.
Log baselines and regressions; gate merges/deploys on thresholds.

Use the copy-paste templates in assets/ for day-0 setup.

Determinism and Flake Control

Installs

143

Repository

vasilyu1983/ai-…s-public

GitHub Stars

62

First Seen

Jan 23, 2026

Security Audits

Gen Agent Trust HubPass

qa-agent-testing — vasilyu1983/ai-agents-public