autoresearch
Autoresearch for Skills
Most skills work about 70% of the time. The other 30% you get garbage. The fix isn't to rewrite the skill from scratch. It's to let an agent run it dozens of times, score every output, and tighten the prompt until that 30% disappears.
This skill adapts Andrej Karpathy's autoresearch methodology (autonomous experimentation loops) to Claude Code skills. Instead of optimizing ML training code, we optimize skill prompts.
the core job
Take any existing skill, define what "good output" looks like as binary yes/no checks, then run an autonomous loop that:
- Generates outputs from the skill using test inputs
- Scores every output against the eval criteria
- Mutates the skill prompt to fix failures
- Keeps mutations that improve the score, discards the rest
- Repeats until the score ceiling is hit or the user stops it
Output: An improved SKILL.md + results.tsv log + changelog.md of every mutation attempted + a live HTML dashboard you can watch in your browser.
More from aresbit/matebot
3d-cad-skill
Create and iterate parametric 3D CAD models for Claude using an inspectable feedback loop. Use when the task involves OpenSCAD, build123d, STL/STEP/3MF output, fixture/enclosure/adapter design, or debugging shape accuracy from renders or screenshots.
2c-skill
Composite C skill that chains modern-c-makefile, spclib, and modern-c-dev.
1docx
Generate and edit Word documents (.docx). Supports professional documents including covers, charts, track-changes editing, and more. Suitable for any .docx creation or modification task.
1pdf
Professional PDF solution. Create PDFs using HTML+Paged.js (academic papers, reports, documents). Process existing PDFs using Python (read, extract, merge, split, fill forms). Supports KaTeX math formulas, Mermaid diagrams, three-line tables, citations, and other academic elements. Also use this skill when user explicitly requests LaTeX (.tex) or native LaTeX compilation.
1spclib
|
1kernel-dev-skill
Linux kernel development skill grounded in local references/labs and references/lectures materials. Use for kernel modules, system calls, process scheduling, interrupts, locking, memory management, filesystems, networking, architecture, debugging, profiling, and device model work.
1