Prompt Evaluation — Claude Code

This skill is a router and workflow. It teaches how to run a prompt-evaluation loop using Claude Code's own subagent capability — no external API calls, no Python, no promptfoo. Read the references on demand.

The core idea

Claude Code's Agent/Task tool spawns a subagent with a fresh context window. The subagent sees only the prompt you pass it; it inherits nothing from the main conversation. When the subagent finishes, it returns a single message to the caller and its working context is discarded.

For prompt evaluation, this gives you three properties for free that are otherwise hard to engineer:

prompt-evaluation-claude-code

Prompt Evaluation — Claude Code

The core idea