testing-llm

Installation
SKILL.md

LLM & AI Testing Patterns

Patterns and tools for testing LLM integrations, evaluating AI output quality, mocking responses for deterministic CI, and applying agentic test workflows (planner, generator, healer).

Quick Reference

Area File Purpose
Rules rules/llm-evaluation.md DeepEval quality metrics, Pydantic schema validation, timeout testing
Rules rules/llm-mocking.md Mock LLM responses, VCR.py recording, custom request matchers
Reference references/deepeval-ragas-api.md Full API reference for DeepEval and RAGAS metrics
Reference references/generator-agent.md Transforms Markdown specs into Playwright tests
Reference references/healer-agent.md Auto-fixes failing tests (selectors, waits, dynamic content)
Reference references/planner-agent.md Explores app and produces Markdown test plans
Checklist checklists/llm-test-checklist.md Complete LLM testing checklist (setup, coverage, CI/CD)
Example examples/llm-test-patterns.md Full examples: mocking, structured output, DeepEval, VCR, golden datasets

When to Use This Skill

Related skills

More from yonatangross/skillforge-claude-plugin

Installs
5
GitHub Stars
170
First Seen
Mar 11, 2026