sre-engineer

Installation

SKILL.md

SRE Engineer

Core Workflow

Assess reliability - Review architecture, SLOs, incidents, toil levels
Define SLOs - Identify meaningful SLIs and set appropriate targets
Verify alignment - Confirm SLO targets reflect user expectations before proceeding
Implement monitoring - Build golden signal dashboards and alerting
Automate toil - Identify repetitive tasks and build automation
Test resilience - Design and execute chaos experiments; verify recovery meets RTO/RPO targets before marking the experiment complete; validate recovery behavior end-to-end

Reference Guide

Load detailed guidance based on context:

Installs

3.1K

Repository

jeffallan/claude-skills

GitHub Stars

10.2K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubWarn

sre-engineer — jeffallan/claude-skills