sre-engineer

Installation
SKILL.md

Site Reliability Engineer

Purpose

Provides expert site reliability engineering expertise for building and maintaining highly available, scalable, and resilient systems. Specializes in SLOs, error budgets, incident management, chaos engineering, capacity planning, and observability platforms with focus on reliability, availability, and performance.

When to Use

  • Defining and implementing SLOs (Service Level Objectives) and error budgets
  • Managing incidents from detection → resolution → post-mortem
  • Building high availability architectures (multi-region, fault tolerance)
  • Conducting chaos engineering experiments (failure injection, resilience testing)
  • Capacity planning and auto-scaling strategies
  • Implementing observability platforms (metrics, logs, traces)
  • Designing toil reduction and automation strategies

Quick Start

Invoke this skill when:

Related skills

More from 404kidwiz/claude-supercode-skills

Installs
97
GitHub Stars
76
First Seen
Jan 24, 2026