eval-harness
Eval Harness
Formal evaluation framework implementing eval-driven development (EDD) — treating evals as unit tests for AI development.
When to Activate
- Setting up eval-driven development for AI workflows
- Defining pass/fail criteria for task completion
- Measuring agent reliability with pass@k metrics
- Creating regression test suites for prompt/agent changes
Philosophy
- Define expected behavior BEFORE implementation
- Run evals continuously during development
- Track regressions with each change
- Use pass@k metrics for reliability measurement
Eval Types
More from xbklairith/kisune
market-analysis
Use when analyzing markets or interpreting charts - applies technical indicators (RSI, MACD, Moving Averages), identifies support/resistance, analyzes multi-timeframe trends, checks fundamentals and sentiment. Activates when user says "analyze BTC", "what's the trend", "check this chart", mentions ticker symbols, or uses /trading:analyze command.
444spec-driven-planning
MANDATORY planning — creates specs in docx/features/ with EARS requirements and technical design. MUST activate instead of ad-hoc planning for any new feature.
33strategy-research
Use when developing or documenting trading strategies - guides edge hypothesis formation, validates statistical significance, documents strategy rules systematically (entry, exit, risk management). Activates when user says "research this strategy", "document my approach", "test this idea", mentions "trading strategy", "edge", or uses /trading:research command.
24pattern-recognition
Use when identifying chart patterns or setups - recognizes candlestick patterns (head and shoulders, double top/bottom, triangles), documents pattern library with entry/exit criteria. Activates when user says "what pattern is this", "is this a flag", "document this setup", mentions pattern names, or uses /trading:pattern command.
23spec-driven-implementation
MANDATORY implementation — breaks design into TDD tasks in docx/features/ tasks.md with Red-Green-Refactor. MUST activate after spec-driven-planning.
21pattern
Chart pattern identification — head and shoulders, double tops, triangles, flags. Documents pattern library with entry/exit criteria.
21