sre-runbooks

Installation
SKILL.md

SRE Runbooks

Overview

Operational runbook templates and SRE practices for production reliability. Covers on-call workflows, incident response, postmortems, capacity planning, and chaos engineering.

SLI/SLO Definition Template

service: payment-api
owner: payments-team
tier: critical

slis:
  availability:
    description: "Ratio of successful HTTP responses (non-5xx) to total responses"
    query: "sum(rate(http_requests_total{service='payment-api', status!~'5..'}[5m])) / sum(rate(http_requests_total{service='payment-api'}[5m]))"
    good_event: "HTTP response with status < 500"
    valid_event: "All HTTP responses (excluding health checks)"
Related skills

More from pfangueiro/claude-code-agents

Installs
1
GitHub Stars
3
First Seen
Apr 12, 2026