llm-usage-researcher
LLM Usage Researcher
Most teams throw expensive models at every problem. But not every task needs Claude Opus or GPT-4 Turbo. A faster, cheaper model might work just as well. The problem: how do you know without testing?
This skill adapts the autoresearch methodology to LLM strategy. Instead of optimizing a skill's prompt, you optimize your LLM usage pattern — which model to use, what temperature setting, what prompt structure, what provider, what context length.
the core job
Take a task, define what "good output" looks like as binary yes/no checks, then run an autonomous loop that:
- Runs the task with different LLM approaches (Kimi + Fireworks, Claude + direct API, GPT-4, local Ollama, etc.)
- Tracks quality, cost, and speed for every run
- Compares against a baseline (usually the most expensive approach)
- Analyzes trade-offs (is 20% cheaper worth 5% quality loss?)
- Recommends the optimal approach for this specific task
Output: A comparison matrix + dashboard + detailed recommendations explaining which model to use, when, and why.
More from iancleary/dotfiles
grill
In plan mode, this should be triggered when the user wants to build something. In this mode, you should ask a lot of questions to extract every detail, assumption, and blind spot from the user's head before proposing a structured plan.
11slidev
Create and present web-based slides for developers using Markdown, Vue components, code highlighting, animations, and interactive features. Use when building technical presentations, conference talks, or teaching materials.
1shaping
Use this methodology when collaboratively shaping a solution with the user - iterating on problem definition (requirements) and solution options (shapes).
1gstack
|
1autoresearch
Autonomously optimize any Claude Code skill by running it repeatedly, scoring outputs against binary evals, mutating the prompt, and keeping improvements. Based on Karpathy's autoresearch methodology. Use when: optimize this skill, improve this skill, run autoresearch on, make this skill better, self-improve skill, benchmark skill, eval my skill, run evals on. Outputs: an improved SKILL.md, a results log, and a changelog of every mutation tried.
1breadboarding
Transform a workflow description into affordance tables showing UI and Code affordances with their wiring. Use to map existing systems or design new ones from shaped parts.
1