adk-eval-guide
Originally fromeliasecchig/adk-docs
Installation
Summary
Comprehensive evaluation methodology guide for ADK agents covering metrics, schemas, and iteration workflows.
- Provides eight evaluation criteria (tool trajectory, response matching, rubric-based scoring, hallucination detection, safety) with configurable thresholds and judge model options
- Includes evalset schema documentation with multi-turn conversation support, tool use trajectory specification, and session state initialization patterns
- Outlines the eval-fix loop: start small, run evaluation, diagnose failures, fix code or evalset, iterate until threshold met
- Documents common failure causes (trajectory gaps, state type mismatches, app name mismatches, model thinking mode conflicts) with specific remediation steps
- References four supplementary guides covering detailed metrics, user simulation, built-in tools behavior, and multimodal evaluation patterns
SKILL.md
ADK Evaluation Guide
Scaffolded project? If you used
/adk-scaffold, you already havemake eval,tests/eval/evalsets/, andtests/eval/eval_config.json. Start withmake evaland iterate from there.Non-scaffolded? Use
adk evaldirectly — see Running Evaluations below.
Reference Files
| File | Contents |
|---|---|
references/criteria-guide.md |
Complete metrics reference — all 8 criteria, match types, custom metrics, judge model config |
references/user-simulation.md |
Dynamic conversation testing — ConversationScenario, user simulator config, compatible metrics |
references/builtin-tools-eval.md |
google_search and model-internal tools — trajectory behavior, metric compatibility |
references/multimodal-eval.md |
Multimodal inputs — evalset schema, built-in metric limitations, custom evaluator pattern |
The Eval-Fix Loop
Related skills