adk-eval-guide

Originally fromeliasecchig/adk-docs
Installation
Summary

Comprehensive evaluation methodology guide for ADK agents covering metrics, schemas, and iteration workflows.

  • Provides eight evaluation criteria (tool trajectory, response matching, rubric-based scoring, hallucination detection, safety) with configurable thresholds and judge model options
  • Includes evalset schema documentation with multi-turn conversation support, tool use trajectory specification, and session state initialization patterns
  • Outlines the eval-fix loop: start small, run evaluation, diagnose failures, fix code or evalset, iterate until threshold met
  • Documents common failure causes (trajectory gaps, state type mismatches, app name mismatches, model thinking mode conflicts) with specific remediation steps
  • References four supplementary guides covering detailed metrics, user simulation, built-in tools behavior, and multimodal evaluation patterns
SKILL.md

ADK Evaluation Guide

Scaffolded project? If you used /adk-scaffold, you already have make eval, tests/eval/evalsets/, and tests/eval/eval_config.json. Start with make eval and iterate from there.

Non-scaffolded? Use adk eval directly — see Running Evaluations below.

Reference Files

File Contents
references/criteria-guide.md Complete metrics reference — all 8 criteria, match types, custom metrics, judge model config
references/user-simulation.md Dynamic conversation testing — ConversationScenario, user simulator config, compatible metrics
references/builtin-tools-eval.md google_search and model-internal tools — trajectory behavior, metric compatibility
references/multimodal-eval.md Multimodal inputs — evalset schema, built-in metric limitations, custom evaluator pattern

The Eval-Fix Loop

Related skills
Installs
2.6K
Repository
google/adk-docs
GitHub Stars
1.3K
First Seen
Mar 9, 2026