chaos-engineering-resilience

Installation
SKILL.md

Chaos Engineering & Resilience Testing

<default_to_action> When testing system resilience or injecting failures:

  1. DEFINE steady state (normal metrics: error rate, latency, throughput)
  2. HYPOTHESIZE system continues in steady state during failure
  3. INJECT real-world failures (network, instance, disk, CPU)
  4. OBSERVE and measure deviation from steady state
  5. FIX weaknesses discovered, document runbooks, repeat

Quick Chaos Steps:

  • Start small: Dev → Staging → 1% prod → gradual rollout
  • Define clear rollback triggers (error_rate > 5%)
  • Measure blast radius, never exceed planned scope
  • Document findings → runbooks → improved resilience

Critical Success Factors:

  • Controlled experiments with automatic rollback
  • Steady state must be measurable
Related skills
Installs
81
GitHub Stars
348
First Seen
Jan 24, 2026