model-comparator
Installation
SKILL.md
Model Comparator
Overview
This skill helps engineering and product teams make informed, structured decisions about which AI or LLM model to use for a given task. It covers comparison across multiple dimensions: benchmark performance, real-world task capability, inference cost per token, latency (time-to-first-token and throughput), context window size, multimodal capabilities, fine-tuning availability, licensing, and data privacy. It provides frameworks for structured comparison, cost modeling at scale, and task-specific head-to-head evaluation to move beyond marketing benchmarks to production-relevant decisions.
When to Use
- Choosing between frontier models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, etc.) for a new product feature
- Deciding whether to use a proprietary API or a self-hosted open-source model
- Selecting an embedding model for a RAG (retrieval-augmented generation) pipeline
- Evaluating cost-quality tradeoffs for a high-volume production use case
- Justifying a model switch to stakeholders with data
- Comparing models for latency-sensitive applications (real-time chat, autocomplete)
- Assessing model capabilities for a specialized domain (medical, legal, code, multilingual)
When NOT to Use
- Building evaluation infrastructure from scratch (use eval-designer skill)
- Fine-tuning or training a model on custom data (use model training skills)
- Comparing internal model versions (use eval-designer skill with your specific metrics)
- Choosing between ML frameworks (TensorFlow vs PyTorch) — that is an infrastructure decision