failure-taxonomy
Failure Taxonomy Builder
Transform raw, freeform trace annotations from open coding sessions into a structured taxonomy of binary failure modes, following the grounded theory methodology from the Analyze-Measure-Improve evaluation lifecycle.
When This Skill Applies
The user has already completed open coding — they've read through LLM pipeline traces and written short, freeform notes describing what went wrong (the "point of first failure"). Now they need to move from that chaotic pile of observations into an organized, actionable taxonomy. This is the axial coding step.
Typical inputs look like a JSON array, CSV, or spreadsheet of objects with fields like:
trace_id— identifier for the traceannotationornote— the freeform open-coded observation- Optionally:
pass_fail,trace_summary,query, or the full trace itself
Core Workflow
More from maragudk/evals-skills
llm-as-a-judge
>
12prompt-engineering
Use this skill when crafting, reviewing, or improving prompts for LLM pipelines — including task prompts, system prompts, and LLM-as-Judge prompts. Triggers include: requests to write or refine a prompt, diagnose why an LLM produces inconsistent or incorrect outputs, bridge the gap between intent and model behavior, reduce ambiguity in instructions, add few-shot examples, structure complex prompts, or improve output formatting. Also use when the user needs help distinguishing specification failures (unclear instructions) from generalization failures (model limitations), or when iterating on prompts based on observed failure modes. Do NOT use for general coding tasks, document creation, or non-LLM writing.
4trace-annotation-tool
>
4