adk-eval-guide

Installation

Summary

Comprehensive evaluation methodology guide for ADK agents covering metrics, schemas, and iteration workflows.

Provides eight evaluation criteria (tool trajectory, response matching, rubric-based scoring, hallucination detection, safety) with configurable thresholds and judge model options
Includes evalset schema documentation with multi-turn conversation support, tool use trajectory specification, and session state initialization patterns
Outlines the eval-fix loop: start small, run evaluation, diagnose failures, fix code or evalset, iterate until threshold met
Documents common failure causes (trajectory gaps, state type mismatches, app name mismatches, model thinking mode conflicts) with specific remediation steps
References four supplementary guides covering detailed metrics, user simulation, built-in tools behavior, and multimodal evaluation patterns

SKILL.md

ADK Evaluation Guide

Scaffolded project? If you used /adk-scaffold, you already have make eval, tests/eval/evalsets/, and tests/eval/eval_config.json. Start with make eval and iterate from there.

Non-scaffolded? Use adk eval directly — see Running Evaluations below.

Reference Files

File	Contents
`references/criteria-guide.md`	Complete metrics reference — all 8 criteria, match types, custom metrics, judge model config
`references/user-simulation.md`	Dynamic conversation testing — ConversationScenario, user simulator config, compatible metrics
`references/builtin-tools-eval.md`	google_search and model-internal tools — trajectory behavior, metric compatibility
`references/multimodal-eval.md`	Multimodal inputs — evalset schema, built-in metric limitations, custom evaluator pattern

The Eval-Fix Loop

Related skills

More from google/adk-docs

Installs

2.6K

Repository

google/adk-docs

GitHub Stars

1.3K

First Seen

Mar 9, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

adk-eval-guide

ADK Evaluation Guide

Reference Files

The Eval-Fix Loop

More from google/adk-docs

adk-dev-guide

adk-cheatsheet

adk-scaffold

adk-deploy-guide

adk-observability-guide