llm-evaluation

Installation

SKILL.md

LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

When to Use This Skill

Measuring LLM application performance systematically
Comparing different models or prompts
Detecting performance regressions before deployment
Validating improvements from prompt changes
Building confidence in production systems
Establishing baselines and tracking progress over time
Debugging unexpected model behavior
Evaluating RAG pipeline quality (retrieval + generation)
Measuring agentic task success rates
Testing structured output schema compliance

Core Evaluation Types

Related skills

More from ckorhonen/claude-skills

video-editor
Expert guidance for video editing with ffmpeg, encoding best practices, and quality optimization. Use when working with video files, transcoding, remuxing, encoding settings, color spaces, or troubleshooting video quality issues.
63
practical-typography
Professional typography guidance based on Matthew Butterick's Practical Typography. Use when evaluating, critiquing, or improving document formatting, text layout, font choices, punctuation, spacing, or any typography-related decisions for print or web content.
37
tui-designer
Design and implement retro/cyberpunk/hacker-style terminal UIs. Covers React (Tuimorphic), SwiftUI (Metal shaders), and CSS approaches. Use when creating terminal aesthetics, CRT effects, neon glow, scanlines, phosphor green displays, or retro-futuristic interfaces.
36
app-marketing-copy
Write marketing copy and App Store / Google Play listings (ASO keywords, titles, subtitles, short+long descriptions, feature bullets, release notes), plus screenshot caption sets and text-to-image prompt templates for generating store screenshot backgrounds/promo visuals. Use when asked to: write/refresh app marketing copy, craft app store metadata, brainstorm taglines/value props, produce ad/landing/email copy, or generate prompts for screenshot/creative generation.
34
markdown-fetch
Fetch and extract web content as clean Markdown when provided with URLs. Use this skill whenever a user provides a URL (http/https link) that needs to be read, analyzed, summarized, or extracted. Converts web pages to Markdown with 80% fewer tokens than raw HTML. Handles all content types including JS-heavy sites, documentation, articles, and blog posts. Supports three conversion methods (auto, AI, browser rendering). Always use this instead of web_fetch when working with URLs - it's more efficient and provides cleaner output.
26
gsplat-optimizer
Optimize 3D Gaussian Splat scenes for real-time rendering on iOS, macOS, and visionOS. Use when working with .ply or .splat files, targeting mobile/Apple GPU performance, or needing LOD, pruning, or compression strategies for 3DGS scenes.
23

Installs

Repository

ckorhonen/claude-skills

GitHub Stars

First Seen

Mar 8, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

llm-evaluation

LLM Evaluation

When to Use This Skill

Core Evaluation Types

More from ckorhonen/claude-skills

video-editor

practical-typography

tui-designer

app-marketing-copy

markdown-fetch

gsplat-optimizer