promptfoo-evaluation
Promptfoo Evaluation
Overview
This skill provides guidance for configuring and running LLM evaluations using Promptfoo, an open-source CLI tool for testing and comparing LLM outputs.
When to Use
- Validating prompt quality, rubric alignment, or regression behavior across different LLM providers.
- Automating model comparisons for bug bounties, research, or QA before releasing prompts into production.
- Creating custom Python assertions or
llm-rubricgrades that Claude will execute under pressure tests.
When NOT to Use
- Quickly testing prompts ad-hoc without needing structured test cases or automation.
- Non-LLM evaluation work such as standard unit tests or infrastructure monitoring.
- Requesting only human-readable advice without running CLI-based evaluations.
Quick Start
More from aleister1102/skills
codeql
>-
26ffuf-web-fuzzing
Expert guidance for ffuf web fuzzing during penetration testing, including authenticated fuzzing with raw requests, auto-calibration, and result analysis
24brainstorming
You MUST use this before any creative work - creating features, building components, adding functionality, or modifying behavior. Explores user intent, requirements and design before implementation.
24prompt-optimizer
Transform vague prompts into precise, well-structured specifications using EARS (Easy Approach to Requirements Syntax) methodology. This skill should be used when users provide loose requirements, ambiguous feature descriptions, or need to enhance prompts for AI-generated code, products, or documents. Triggers include requests to "optimize my prompt", "improve this requirement", "make this more specific", or when raw requirements lack detail and structure.
24skill-creator
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
23semgrep
>-
23