Generate Judgements for Skill Evaluation

Analyze a skill's source files and produce fine-grained judge_definitions for the mlflow-skills automated evaluation framework. Each judgement is a yes/no question that an LLM judge answers by reading the execution trace.

Prerequisites

Access to the target skill directory (must contain SKILL.md)
Familiarity with the mlflow-skills YAML config format (see references/yaml-config-spec.md)

Workflow

digraph generate_judgements {
  rankdir=TB;
  node [shape=box];

  collect [label="Phase 1\nCollect & Analyze Skill Files"];

Related skills

generate-judgements

Generate Judgements for Skill Evaluation

Prerequisites

Workflow

More from panlm/skills

aws-fis-experiment-prepare

aws-fis-experiment-execute

aws-best-practice-research

eks-workload-best-practice-assessment

aws-service-chaos-research

eks-app-log-analysis