zeroeval-install
ZeroEval Install and Integrate
Guide users from zero to production-ready ZeroEval integration: tracing, prompt management, and automated judges.
When To Use
- Setting up ZeroEval for the first time in any language.
- Adding tracing/observability to an existing AI app, agent, or pipeline.
- Migrating hardcoded prompts to
ze.promptwith staged rollout (Python / TypeScript). - Choosing and configuring judges for automated evaluation.
- Troubleshooting missing traces, broken feedback loops, or prompt metadata issues.
Execution Sequence
Follow these steps in order. Each step references a specific playbook in references/ for deep details; load only the relevant playbook when needed.
Step 1: Detect Integration Path
Determine which integration path fits the user's setup:
More from zeroeval/zeroeval-skills
manage-data
Create, load, push, version, and manage benchmark datasets with the ZeroEval Python SDK or git. Use when adding data to a benchmark, creating a dataset from code or CSV, pushing data to the backend, managing subsets, pulling existing benchmarks, converting data to Parquet, or setting up a git-based data workflow. Triggers on "add data", "create dataset", "push dataset", "upload data", "manage benchmark data", "dataset versioning", "subsets", "pull dataset", "parquet", "multimodal dataset".
16run-evals
Write tasks, evaluations, and scoring pipelines with the ZeroEval Python SDK. Covers defining @ze.task functions, running evals with dataset.eval(), writing row/column/run evaluators, scoring with column_map, emitting signals, configuring execution (workers, retries, checkpoints), repeating and resuming runs, and inspecting results. Triggers on "run evals", "write evaluation", "benchmark model", "score results", "evaluation pipeline", "task decorator", "scoring function", "column_map", "emit signal", "resume eval", "repeat eval".
16create-judge
This skill should be used when users want to create, design, or configure an automated judge in ZeroEval. It guides through understanding the evaluation goal, choosing binary vs scored evaluation, writing the judge template, designing structured criteria, and creating the judge via dashboard or API. Triggers on "create a judge", "add a judge", "evaluate my LLM output", "set up automated evaluation", "judge template", or "scoring criteria".
11prompt-migration
This skill should be used when users want to migrate hardcoded prompts to ze.prompt for version tracking, feedback collection, judge linkage, and prompt optimization. It covers the full migration workflow for both Python and TypeScript. Triggers on "migrate prompt", "ze.prompt", "hardcoded prompt", "prompt migration", "send feedback", "prompt optimization", "wire feedback", or "connect judges to prompts".
11custom-tracing
This skill should be used when users want to send traces to ZeroEval without installing the SDK, using the REST API or OpenTelemetry (OTLP) directly. It covers direct HTTP span ingestion, OTLP collector configuration, and first-trace verification for any language. Triggers on "send traces via API", "direct API tracing", "custom tracing", "manual tracing", "without SDK", "unsupported language", "REST API tracing", "OTLP", "OpenTelemetry", or language cues like "Go", "Ruby", "Java", "Rust", "Elixir", or "PHP".
9