LLM Usage Researcher

Most teams throw expensive models at every problem. But not every task needs Claude Opus or GPT-4 Turbo. A faster, cheaper model might work just as well. The problem: how do you know without testing?

This skill adapts the autoresearch methodology to LLM strategy. Instead of optimizing a skill's prompt, you optimize your LLM usage pattern — which model to use, what temperature setting, what prompt structure, what provider, what context length.

the core job

Take a task, define what "good output" looks like as binary yes/no checks, then run an autonomous loop that:

Runs the task with different LLM approaches (Kimi + Fireworks, Claude + direct API, GPT-4, local Ollama, etc.)
Tracks quality, cost, and speed for every run
Compares against a baseline (usually the most expensive approach)
Analyzes trade-offs (is 20% cheaper worth 5% quality loss?)
Recommends the optimal approach for this specific task

Output: A comparison matrix + dashboard + detailed recommendations explaining which model to use, when, and why.

llm-usage-researcher

LLM Usage Researcher

the core job

More from iancleary/dotfiles

grill

slidev

shaping

gstack

autoresearch

breadboarding