Testing Guide

What to test, how to test it, and what NOT to test — for a plugin made of prompt files, Python glue, and configuration.

Philosophy: GenAI-First Testing

Traditional unit tests work for deterministic logic. But most bugs in this project are drift — docs diverge from code, agents contradict commands, component counts go stale. GenAI congruence tests catch these. Unit tests don't.

Decision rule: Can you write assert x == y and it won't break next week? → Unit test. Otherwise → GenAI test or structural test.

Three Test Patterns

1. Judge Pattern (single artifact evaluation)

An LLM evaluates one artifact against criteria. Use for: doc completeness, security posture, architectural intent.

testing-guide

Testing Guide

Philosophy: GenAI-First Testing

Three Test Patterns

1. Judge Pattern (single artifact evaluation)