model-comparator

Installation
SKILL.md

Model Comparator

Overview

This skill helps engineering and product teams make informed, structured decisions about which AI or LLM model to use for a given task. It covers comparison across multiple dimensions: benchmark performance, real-world task capability, inference cost per token, latency (time-to-first-token and throughput), context window size, multimodal capabilities, fine-tuning availability, licensing, and data privacy. It provides frameworks for structured comparison, cost modeling at scale, and task-specific head-to-head evaluation to move beyond marketing benchmarks to production-relevant decisions.

When to Use

  • Choosing between frontier models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, etc.) for a new product feature
  • Deciding whether to use a proprietary API or a self-hosted open-source model
  • Selecting an embedding model for a RAG (retrieval-augmented generation) pipeline
  • Evaluating cost-quality tradeoffs for a high-volume production use case
  • Justifying a model switch to stakeholders with data
  • Comparing models for latency-sensitive applications (real-time chat, autocomplete)
  • Assessing model capabilities for a specialized domain (medical, legal, code, multilingual)

When NOT to Use

  • Building evaluation infrastructure from scratch (use eval-designer skill)
  • Fine-tuning or training a model on custom data (use model training skills)
  • Comparing internal model versions (use eval-designer skill with your specific metrics)
  • Choosing between ML frameworks (TensorFlow vs PyTorch) — that is an infrastructure decision
Related skills
Installs
10
GitHub Stars
15
First Seen
Apr 13, 2026