qe-chaos-engineering-resilience
Installation
SKILL.md
Chaos Engineering & Resilience Testing
<default_to_action> When testing system resilience or injecting failures:
- DEFINE steady state (normal metrics: error rate, latency, throughput)
- HYPOTHESIZE system continues in steady state during failure
- INJECT real-world failures (network, instance, disk, CPU)
- OBSERVE and measure deviation from steady state
- FIX weaknesses discovered, document runbooks, repeat
Quick Chaos Steps:
- Start small: Dev → Staging → 1% prod → gradual rollout
- Define clear rollback triggers (error_rate > 5%)
- Measure blast radius, never exceed planned scope
- Document findings → runbooks → improved resilience