Promptfoo Evaluation

Overview

This skill provides guidance for configuring and running LLM evaluations using Promptfoo, an open-source CLI tool for testing and comparing LLM outputs.

When to Use

Validating prompt quality, rubric alignment, or regression behavior across different LLM providers.
Automating model comparisons for bug bounties, research, or QA before releasing prompts into production.
Creating custom Python assertions or llm-rubric grades that Claude will execute under pressure tests.

When NOT to Use

Quickly testing prompts ad-hoc without needing structured test cases or automation.
Non-LLM evaluation work such as standard unit tests or infrastructure monitoring.
Requesting only human-readable advice without running CLI-based evaluations.

promptfoo-evaluation

Promptfoo Evaluation

Overview

When to Use

When NOT to Use

Quick Start

More from aleister1102/skills

codeql

ffuf-web-fuzzing

brainstorming

prompt-optimizer

skill-creator

semgrep