mcp-builder
MCP Server Evaluation Guide
Overview
This document provides guidance on creating comprehensive evaluations for MCP servers. Evaluations test whether LLMs can effectively use your MCP server to answer realistic, complex questions using only the tools provided.
Quick Reference
Evaluation Requirements
- Create 10 human-readable questions
- Questions must be READ-ONLY, INDEPENDENT, NON-DESTRUCTIVE
- Each question requires multiple tool calls (potentially dozens)
- Answers must be single, verifiable values
- Answers must be STABLE (won't change over time)
Output Format
<evaluation>
More from bobmatnyc/claude-mpm-skills
drizzle-orm
Type-safe SQL ORM for TypeScript with zero runtime overhead
4.2Kplaywright-e2e-testing
Playwright modern end-to-end testing framework with cross-browser automation, auto-wait, and built-in test runner
2.7Kpydantic
Python data validation using type hints and runtime type checking with Pydantic v2's Rust-powered core for high-performance validation in FastAPI, Django, and configuration management.
2.2Ktailwind-css
Tailwind CSS utility-first framework for rapid UI development with responsive design and dark mode
1.2Ktrpc-type-safety
tRPC end-to-end type-safe APIs for TypeScript with React Query integration and full-stack type safety
1.1Kpytest
pytest - Python's most powerful testing framework with fixtures, parametrization, plugins, and framework integration for FastAPI, Django, Flask
899