usability-testing

Installation

SKILL.md

Usability Testing

Coverage

Usability testing covers the evaluative research practice of watching people attempt realistic tasks on a prototype or product, then identifying the obstacles they encounter. The dominant method is the think-aloud protocol (Ericsson & Simon), where participants narrate their thoughts as they work, surfacing the mental model they are using and the points where it diverges from the design. Sessions are organized around task scenarios — short narratives that frame a goal without prescribing the steps ("you want to find out how much you owe in taxes this quarter") — and a moderator who maintains neutrality, resists answering questions, and prompts only with open-ended interventions like "what are you thinking now?" or "what did you expect to happen?".

The skill covers sample sizing. The widely-cited Nielsen/Landauer "5-user rule" estimates that 5 users surface ~85% of major usability problems for a homogeneous user group on a discrete task, with steeply diminishing returns afterward. The rule has important limits: it applies per distinct user segment, per discrete task scope, and to formative (iterative diagnostic) testing — not to summative (benchmark) studies, which require much larger samples for valid statistical comparison. Misapplying the 5-user rule to summative claims is a common error.

Findings are organized by severity rating (Nielsen's 0–4 scale: cosmetic, minor, major, catastrophic) so the team can triage. Task success rate, time on task, and standardized instruments like SUS (System Usability Scale, Brooke 1996) provide quantitative complements when needed. The practice distinguishes moderated sessions (richer data, higher cost, requires scheduling) from unmoderated tools (lower cost, scales to dozens of sessions, sacrifices the moderator's ability to follow up on surprises).

The skill also covers what NOT to do in a session: leading prompts, defending the design, explaining how the design "is supposed to work" when the participant gets stuck, and over-fitting interpretations to a single dramatic finding from one participant.

Philosophy

Usability testing is built on a humbling claim: designers and engineers cannot reliably predict where users will struggle. The mental models that make a design feel obvious to its creators are exactly the models a fresh user lacks, and only direct observation closes that gap. The discipline rejects "I think users will understand this" in favor of "we watched users; here is what happened." Each session that confirms the design entirely is mildly suspicious — either the tasks were too easy or the moderator was unintentionally helping.

The practice is opinionated about moderator behavior. The moderator's job is to be uninteresting — to let the silence sit, to let the participant struggle long enough for the obstacle to become visible, to not rescue. This is hard because the social instinct is to help, and the design instinct is to defend. A moderator who explains the design after a participant gets stuck has destroyed the evidence; the obstacle the participant just encountered is the finding, and it cannot be re-observed in that session.

Related skills