model-comparator
Model Comparator
Overview
This skill helps engineering and product teams make informed, structured decisions about which AI or LLM model to use for a given task. It covers comparison across multiple dimensions: benchmark performance, real-world task capability, inference cost per token, latency (time-to-first-token and throughput), context window size, multimodal capabilities, fine-tuning availability, licensing, and data privacy. It provides frameworks for structured comparison, cost modeling at scale, and task-specific head-to-head evaluation to move beyond marketing benchmarks to production-relevant decisions.
When to Use
- Choosing between frontier models (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro, etc.) for a new product feature
- Deciding whether to use a proprietary API or a self-hosted open-source model
- Selecting an embedding model for a RAG (retrieval-augmented generation) pipeline
- Evaluating cost-quality tradeoffs for a high-volume production use case
- Justifying a model switch to stakeholders with data
- Comparing models for latency-sensitive applications (real-time chat, autocomplete)
- Assessing model capabilities for a specialized domain (medical, legal, code, multilingual)
When NOT to Use
- Building evaluation infrastructure from scratch (use eval-designer skill)
- Fine-tuning or training a model on custom data (use model training skills)
- Comparing internal model versions (use eval-designer skill with your specific metrics)
- Choosing between ML frameworks (TensorFlow vs PyTorch) — that is an infrastructure decision
More from nickcrew/claude-ctx-plugin
react-performance-optimization
React performance optimization patterns using memoization, code splitting, and efficient rendering strategies. Use when optimizing slow React applications, reducing bundle size, or improving user experience with large datasets.
1.2Kowasp-top-10
OWASP Top 10 security vulnerabilities with detection and remediation patterns. Use when conducting security audits, implementing secure coding practices, or reviewing code for common security vulnerabilities.
452ui-design-aesthetics
Generates high-quality, non-generic UI designs with a focus on performance, progressive disclosure, and distinctive aesthetics.
114helm-chart-patterns
Helm chart development patterns for packaging and deploying Kubernetes applications. Use when creating reusable Helm charts, managing multi-environment deployments, or building application catalogs for Kubernetes.
110code-explanation
Use when explaining code, concepts, or system behavior to a specific audience level - provides a structured explanation workflow with depth control and validation steps.
103security-testing-patterns
Security testing patterns including SAST, DAST, penetration testing, and vulnerability assessment techniques. Use when implementing security testing pipelines, conducting security audits, or validating application security controls.
91