Cekura Eval Design

Purpose

Guide the creation of effective Cekura evaluators (test scenarios) that thoroughly exercise AI voice agent capabilities. Evaluators simulate callers to test the main agent — they are NOT metrics (which evaluate transcripts after the fact).

Performing Platform Actions

When this skill suggests creating, listing, updating, or evaluating something on Cekura, prefer using available platform tools over describing API calls or dashboard steps. In Claude Code with the Cekura plugin installed, these tools are auto-configured and handle authentication, parameter validation, and error handling for you. Fall back to direct API endpoints or dashboard guidance only when no tools are available in the current session.

Core Terminology

Main agent: The client's AI voice agent being tested
Testing agent: Cekura's simulated caller that exercises the main agent
Evaluator/Scenario: A test case defining what the simulated caller does and what success looks like
Metric: A post-call evaluation that scores a transcript (separate concept — see cekura-metrics plugin)
Personality: Voice, language, accent, and behavioral traits for the simulated caller
Test Profile: Identity and context data passed to testing agent AND main agent (for chat/websocket runs)
Conditional Action: Structured, deterministic testing agent behavior with adaptive fallback

cekura-eval-design

Cekura Eval Design

Purpose

Performing Platform Actions

Core Terminology

More from cekura-ai/cekura-skills

cekura-metric-design

cekura-coordinator

cekura-fixing-prod-issues

cekura-metric-improvement

cekura-self-improving-agent

cekura-onboarding