arize-evaluator
Arize Evaluator Skill
SPACE— All--spaceflags and theARIZE_SPACEenv var accept a space name (e.g.,my-workspace) or a base64 space ID (e.g.,U3BhY2U6...). Find yours withax spaces list.
This skill covers designing, creating, and running LLM-as-judge evaluators on Arize. An evaluator defines the judge; a task is how you run it against real data.
Prerequisites
Proceed directly with the task — run the ax command you need. Do NOT check versions, env vars, or profiles upfront.
If an ax command fails, troubleshoot based on the error:
command not foundor version error → see references/ax-setup.md401 Unauthorized/ missing API key → runax profiles showto inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys- Space unknown → run
ax spaces listto pick by name, or ask the user - LLM provider call fails (missing OPENAI_API_KEY / ANTHROPIC_API_KEY) → run
ax ai-integrations list --space SPACEto check for platform-managed credentials. If none exist, ask the user to provide the key or create an integration via the arize-ai-provider-integration skill - Security: Never read
.envfiles or search the filesystem for credentials. Useax profilesfor Arize credentials andax ai-integrationsfor LLM provider keys. If credentials are not available through these channels, ask the user. - CRITICAL — Never fabricate evaluation results: If an evaluation task fails, is cancelled, or produces no scores, report the failure clearly and explain what went wrong. Do NOT perform a "manual evaluation," invent quality scores, estimate percentages, or present any agent-generated analysis as if it came from the Arize evaluation system. Instead suggest: (1) fix the identified issue and retry, (2) try running from the Arize UI, (3) verify integration credentials with
ax ai-integrations list, (4) contact support at https://arize.com/support
More from github/awesome-copilot
git-commit
Execute git commit with conventional commit message analysis, intelligent staging, and message generation. Use when user asks to commit changes, create a git commit, or mentions "/commit". Supports: (1) Auto-detecting type and scope from changes, (2) Generating conventional commit messages from diff, (3) Interactive commit with optional type/scope/description overrides, (4) Intelligent file staging for logical grouping
30.2Kgh-cli
GitHub CLI (gh) comprehensive reference for repositories, issues, pull requests, Actions, projects, releases, gists, codespaces, organizations, extensions, and all GitHub operations from the command line.
21.2Kdocumentation-writer
Diátaxis Documentation Expert. An expert technical writer specializing in creating high-quality software documentation, guided by the principles and structure of the Diátaxis technical documentation authoring framework.
17.4Kprd
Generate high-quality Product Requirements Documents (PRDs) for software systems and AI-powered features. Includes executive summaries, user stories, technical specifications, and risk analysis.
17.4Kexcalidraw-diagram-generator
Generate Excalidraw diagrams from natural language descriptions. Use when asked to "create a diagram", "make a flowchart", "visualize a process", "draw a system architecture", "create a mind map", or "generate an Excalidraw file". Supports flowcharts, relationship diagrams, mind maps, and system architecture diagrams. Outputs .excalidraw JSON files that can be opened directly in Excalidraw.
16.4Krefactor
Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements.
16.1K