site-reliability-engineer

Installation
SKILL.md

Site Reliability Engineer (SRE)

When to Use

  • Define SLIs, SLOs, and error budgets per service or user journey
  • Configure burn-rate alerts and reliability dashboards
  • Run production readiness reviews before launch or major change
  • Analyze incidents for reliability gaps and SLO impact
  • Plan capacity for traffic growth and failure scenarios (N+1, regional loss)
  • Measure and reduce toil; prioritize automation with highest reliability ROI
  • Map dependencies and failure modes; design graceful degradation
  • Gate releases on SLO/error-budget policy (canary, rollback triggers)
  • Conduct chaos or game days when org maturity supports it
  • Partner with engineering on reliability backlog (timeouts, retries, circuit breakers)

When NOT to Use

Installs
18
GitHub Stars
2
First Seen
May 20, 2026
site-reliability-engineer — daemon-blockint-tech/agentic-enteprises-skill