skill-testing

Installation

SKILL.md

Skill Testing

Create LLM-as-judge behavioral evals for agent skills.

What This Produces

<skill>/tests/
  eval.sh                 # evaluation harness (from template)
  golden_examples.yaml    # test scenarios

Workflow

1. Understand the skill

Read the target skill's SKILL.md and any supplementary files. Identify:

Core behaviors the skill enforces

Related skills

More from jrollin/claudio

spec-create
Create a new feature specification following a phased workflow. Use when starting a new feature that needs requirements, design, and task planning. Invoke for spec-driven development, feature specification, requirements-design-tasks workflow.
1
spec-impl
Task-by-task implementer that reads a completed spec and executes each task atomically. Use when a feature spec exists and you're ready to implement. Invoke for spec implementation, task execution, spec-driven development.
1
agent-browser
when asking to check ui or tests automation in browser
1
event-modeling-tasks
Use when translating a completed event model into implementation tasks. Invoke when an event model with slices and specifications exists and needs to become a development plan, task breakdown, or spec-create compatible output.
1
event-modeling-spec
Use when designing systems with Event Modeling methodology, creating event models, or when user mentions event modeling, commands/events/views blueprints, system timeline design, or CQRS system design workshops.
1

Installs

1

Repository

jrollin/claudio

First Seen

Mar 6, 2026

Security Audits

Gen Agent Trust HubPass