agents-md-evals
AGENTS.md / CLAUDE.md Evaluator
Evaluate whether rules in instruction files actually change model behavior, identify non-discriminating rules (model already does this by default), and optimize the file for maximum impact per token.
Core Concept
Most AGENTS.md/CLAUDE.md files contain rules the model already follows without being told. These waste context tokens every conversation. This skill runs controlled A/B tests — with the instruction file vs. without it — to identify which rules earn their place and which can be cut.
The Codebase-Teaches-Patterns Effect
Well-structured codebases make most instruction rules redundant. The model explores existing code — reads package.json, scans existing components, follows import patterns — and matches conventions automatically. In empirical testing on a 755-line CLAUDE.md across a monorepo with 3 frontend stacks, 25 of 26 assertions passed identically with or without the instruction file. The model discovered React patterns, shadcn/ui conventions, Recharts usage, Hebrew labels, Firestore helpers, Zod validation, and serverTimestamp — all from existing code.
The only assertion that discriminated was pure domain knowledge the codebase couldn't teach: parallel components in separate directories that must always be changed together.
This means your CLAUDE.md is probably 80-95% redundant. The eval process will reveal exactly which rules survive.
The Evaluation Loop
- Read the instruction file and categorize each rule
More from vltansky/skills
simplify
Review changed code for reuse, quality, and efficiency, then fix any issues found. Use when the user says \"simplify\", \"simplify this\", \"review changes\", \"clean up my code\", \"check for duplicates\", \"code reuse review\", or wants a post-change quality sweep.
21what-i-did
Summarize your GitHub activity from the last 24 hours across all repos. Use when user says "what did I do", "my activity", "standup", "recap", "summarize my day", "what-i-did", "git activity", "daily summary".
5debug-mode
This skill should be used when debugging frontend/UI bugs that need runtime evidence. USE THIS SKILL (instead of adding console.log) when you''re about to say "add console.log and ask user to check", "open DevTools and tell me what you see", "reproduce the bug and share the output", "check the browser console". Triggers: "debug this", "fix this bug", "why isn''t this working", "investigate this issue", "trace the problem", "figure out why X happens", "UI not updating", "state is wrong", "value is null/undefined", "click doesn''t work", "modal not showing". Automates log collection server-side - you read logs directly, no user copy-paste needed.
2rfc-research
Research a technical topic and produce an RFC document backed by real code evidence from GitHub. Use when user says 'write an RFC', 'RFC research', 'create RFC for', 'technical proposal', 'design doc', 'investigate X', 'research X and write a proposal', 'architecture decision record', 'ADR', or needs a structured technical decision document with prior art analysis.
2roast-my-agents-md
>-
2batch
Parallel work orchestration — decompose large changes into 5-30 worktree agents that each open a PR. Use when the user says 'batch', 'do this in parallel', 'split into PRs', 'bulk change', 'mass refactor', or wants a sweeping mechanical change across many files.
1