langsmith-evaluator
Installation
Summary
Build evaluation pipelines for LangSmith with LLM-as-Judge and custom code evaluators.
- Three core components: creating evaluators (LLM-as-Judge or custom code), defining run functions to capture agent outputs and trajectories, and running evaluations locally or auto-running via uploaded evaluators
- Supports both offline evaluators (comparing run outputs to dataset examples) and online evaluators (real-time quality checks on production runs)
- Requires LangSmith API key and project configuration; includes Python and TypeScript examples with structured output support for LLM judges
- Critical workflow: inspect actual agent output structure and dataset schema before writing evaluators; query LangSmith traces to verify trajectory data and field names match
SKILL.md
LANGSMITH_API_KEY=lsv2_pt_your_api_key_here # REQUIRED
LANGSMITH_PROJECT=your-project-name # Check this to know which project has traces
LANGSMITH_WORKSPACE_ID=your-workspace-id # Optional: for org-scoped keys
OPENAI_API_KEY=your_openai_key # For LLM as Judge
Authentication is REQUIRED: either set the LANGSMITH_API_KEY environment variable, or pass the --api-key flag to CLI commands (preferred):
langsmith evaluator list --api-key $LANGSMITH_API_KEY
Related skills
More from langchain-ai/langsmith-skills
langsmith-trace
INVOKE THIS SKILL when working with LangSmith tracing OR querying traces. Covers adding tracing to applications and querying/exporting trace data. Uses the langsmith CLI tool.
1.8Klangsmith-dataset
INVOKE THIS SKILL when creating evaluation datasets, uploading datasets to LangSmith, or managing existing datasets. Covers dataset types (final_response, single_step, trajectory, RAG), CLI management commands, SDK-based creation, and example management. Uses the langsmith CLI tool.
1.7K