sre-engineer
Installation
SKILL.md
skill:sre-engineer - Site Reliability Engineering & Observability
Version: 1.0.0
Purpose
The sre-engineer skill implements Site Reliability Engineering practices for production systems. It defines and tracks Service Level Objectives (SLOs), implements comprehensive observability, designs incident response processes, conducts postmortems, performs chaos engineering, and manages on-call practices.
Use this skill when:
- Defining SLIs, SLOs, and error budgets for services
- Implementing monitoring and observability stacks
- Designing alerting strategies and on-call processes
- Creating incident response runbooks and playbooks
- Conducting postmortems and blameless retrospectives
- Planning capacity and performance testing
- Implementing chaos engineering experiments
- Establishing reliability culture and SRE practices