eval-loop
Eval Loop — Orchestrator
Meta process skill. Turns a measurable initiative into a loop-centered workspace where strategy artifacts, marketing/content execution artifacts, eval snapshots, result rows, and promoted learnings live together.
Core Question: "Can future agents improve this measurable surface by reading one loop folder instead of reconstructing history from scattered skill outputs?"
Critical Gates
- Measurable surface required. If the user cannot name a page, campaign, post series, ad set, email sequence, outreach motion, or other observable surface, return
NEEDS_CONTEXTand recommenddiscover. - Metric path required. The loop must name at least one primary metric and where it will come from, even if the baseline is not known yet. No metric path -> no loop.
- No skill-centered folders. Do not create
skills-resources/{skill-name}/.... Eval loops are organized by measurable initiative. - Execution boundary. This stack may execute marketing/content assets. It does not deploy code, publish to platforms, build app UI, or mutate external systems.
- No unattended infinite marketing loops. Borrow
autoresearch's ledger and keep/discard discipline, not its "run forever" posture. Human approval gates publishing and live-surface changes.
Reference
Read ../../references/eval-loop-spec.md before writing or modifying any loop artifact.
Responsibility Split
More from hungv47/meta-skills
task-breakdown
Decomposes a spec or architecture into buildable tasks with acceptance criteria, dependencies, and implementation order for AI agents or engineers. Produces `skills-resources/meta/tasks.md`. Not for clarifying unclear requirements (use discover) or designing architecture (use system-architecture). For code quality checks after building, see fresh-eyes.
80discover
Conversational discovery — adapts from quick scoping (3-5 questions) to deep interviews (multi-round). Talk until we're clear, then build. Produces inline decisions; optionally saves spec.md or scope contract. Not for multi-perspective debate (use agents-panel). Not for decomposing work (use task-breakdown). Not for diagnosing a known metric decline or root-causing a problem (use diagnose).
77agent-room
Multi-agent discussion rooms — debate or poll a problem from multiple perspectives. Standalone or invoked by other skills as a sub-routine. Mode=debate: N agents argue in rounds, converge. Mode=poll: N agents independently analyze, aggregate by consensus. Not for implementation (use system-architecture). Not for verification (use review-chain). For clarifying requirements first, see discover. For decomposing work after a decision, see task-breakdown.
67review-chain
Post-implementation quality check via fresh-eyes review. Chain: Implement → Review (independent agent) → Resolve (if issues). Max 2 rounds. Auto-triggers for security-sensitive and data-mutation code. Not for code refactoring (use code-cleanup). Not for decision analysis (use agent-room).
61navigate
Artifact status + multi-phase orchestration. Scan what exists, check freshness, compose and track complex workflows across sessions. Not for skill routing (the agent does that proactively).
57fresh-eyes
Post-implementation quality check via fresh-eyes review. Chain: Implement → Review (independent agent) → Resolve (if issues). Max 2 rounds. Auto-triggers for security-sensitive and data-mutation code. Not for code refactoring (use code-cleanup). Not for decision analysis (use agents-panel).
17