ml-model-evaluation

Installation

SKILL.md

Ml Model Evaluation

Overview

Use this skill to evaluate models with decision-grade evidence across aggregate and high-risk segments.

Scope Boundaries

Use this skill when the task matches the trigger condition described in description.
Do not use this skill when the primary task falls outside this skill's domain.

Shared References

Threshold and segmentation rules:
- references/threshold-and-segmentation-rules.md

Templates And Assets

Evaluation report template:
- assets/evaluation-report-template.md

Inputs To Gather

Dataset splits and baseline/candidate definitions.

Related skills

More from kentoshimizu/sw-agent-skills

graph-algorithms
Graph algorithm workflow for modeling entities/relations and selecting traversal, path, ordering, or flow strategies. Use when correctness or performance depends on graph representation and algorithm choice; do not use for schema-only modeling or deployment topology planning.
14
bash-style-guide
Style, review, and refactoring standards for Bash shell scripting. Trigger when `.sh` files, files with `#!/usr/bin/env bash` or `#!/bin/bash`, or CI workflow blocks with `shell: bash` are created, modified, or reviewed and Bash-specific quality controls (quoting safety, error handling, portability, readability) must be enforced. Do not use for generic POSIX `sh`, PowerShell, or language-specific application style rules. In multi-language pull requests, run together with other applicable `*-style-guide` skills.
11
architecture-clean-architecture
Clean Architecture workflow for enforcing dependency direction, stable domain boundaries, and use-case-centered application design. Use when teams must separate business rules from frameworks and delivery mechanisms; do not use for isolated module cleanup without boundary implications.
11
powershell-style-guide
Style, review, and refactoring standards for PowerShell scripting. Trigger when `.ps1`, `.psm1`, `.psd1` files, or CI workflow blocks with `shell: pwsh` or `shell: powershell` are created, modified, or reviewed and PowerShell-specific quality controls (error handling, parameter validation, readability, operational safety) must be enforced. Do not use for Bash, generic POSIX `sh`, or language-specific application style rules. In multi-language pull requests, run together with other applicable `*-style-guide` skills.
10
github-codeowners-management
Govern CODEOWNERS rules so review routing reflects real ownership and risk boundaries on GitHub. Use when repository ownership mapping or mandatory reviewer rules must be defined, updated, or audited; do not use for non-GitHub runtime architecture or data-layer design.
9
security-authentication
Security workflow for authentication architecture, credential lifecycle, and session/token assurance. Use when login, identity proofing, MFA, or session security decisions are required; do not use for authorization policy design or non-security quality tuning.
9

Installs

Repository

kentoshimizu/sw…t-skills

GitHub Stars

First Seen

Feb 28, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

ml-model-evaluation

Ml Model Evaluation

Overview

Scope Boundaries

Shared References

Templates And Assets

Inputs To Gather

More from kentoshimizu/sw-agent-skills

graph-algorithms

bash-style-guide

architecture-clean-architecture

powershell-style-guide

github-codeowners-management

security-authentication