package-evaluator
Package Evaluator
Packages that do not activate on relevant queries waste the entire investment in writing them. A skill can have deep, well-structured content and still deliver zero value if its frontmatter description lacks the trigger phrases users actually type. An agent without a decision tree produces inconsistent results. A hook without a handler script is inert. Quality evaluation catches trigger gaps, missing sections, structural deficiencies, and shallow content before deployment — turning a package from a static document into a reliable tool.
Reference Files
| File | Contents |
|---|---|
references/evaluation-rubric.md |
Detailed 1-5 scoring criteria per dimension, weight justifications, type-specific criteria, worked examples for calibration |
Audit Modes
Two modes, selected by input:
More from mathews-tom/armory
architecture-diagram
Generate layered architecture diagrams as self-contained HTML with inline SVG icons, CSS Grid containers, and connection overlays. Triggers on: "architecture diagram", "infra diagram", "system diagram", "deployment diagram", "topology", "draw architecture". NOT for architecture reviews, use architecture-reviewer.
61concept-to-image
Turn concepts into static HTML visuals exported as PNG or SVG files via HTML/CSS/SVG. Triggers on: "create an image of", "export as PNG", "save as SVG", "concept to image", "screenshot this HTML". NOT for interactive HTML, use static-web-artifacts-builder.
57concept-to-video
Turn concepts into animated explainer videos using Manim (Python) with MP4/GIF output, audio overlay, multi-scene composition. Triggers on: "create a video", "animate this", "make an explainer", "manim animation", "motion graphic". NOT for React video, use remotion-video.
57devils-advocate
Challenges AI-generated plans, code, and designs via pre-mortem, inversion, and Socratic questioning to surface blind spots and failure modes. Triggers on: "challenge this", "devils advocate", "stress test this plan", "poke holes in this", "what am I missing".
54pr-review
Diff-based PR review across code quality, test coverage, silent failures, type design, and comment quality with severity-ranked findings. Triggers on: "review my PR", "review this code", "check my changes", "audit this PR", "code review". NOT for pre-landing gate, use pre-landing-review.
52adr-writer
Generates Architecture Decision Records capturing context, rationale, alternatives, and consequences in numbered status-tracked format. Triggers on: "write an ADR", "document this decision", "architecture decision record", "decision record", "design decision", "ADR for".
52