dspy-evaluate

Installation
SKILL.md

Evaluate Your DSPy Program

Guide the user through measuring AI quality with DSPy's Evaluate class. The pattern: pick a metric, prepare a devset, run the evaluator, interpret results, then feed the same metric into an optimizer.

What is dspy.Evaluate

dspy.Evaluate runs your program on every devset example, scores each with a metric, and reports the aggregate score. It handles threading and progress display. Returns a percentage (0-100).

Built-in metrics

DSPy provides answer_exact_match (normalized string equality) and answer_passage_match (substring check). Both expect an answer field on example and prediction.

SemanticF1

Measures token-level overlap between the predicted and expected answer using an F1 score. More forgiving than exact match — it gives partial credit for answers that are close but not identical:

from dspy.evaluate import SemanticF1
Related skills

More from lebsral/dspy-programming-not-prompting-lms-skills

Installs
4
GitHub Stars
5
First Seen
Mar 17, 2026