run-chaos-experiment
Installation
SKILL.md
Run Chaos Experiment
Inject controlled failures to test and improve system resilience.
When to Use
- Before major product launches (load testing)
- After architecture changes (validate resilience)
- During GameDays or disaster recovery drills
- To validate assumptions about failure modes
- As part of SRE maturity program
Inputs
- Required: Kubernetes cluster (for Litmus or Chaos Mesh)
- Required: Steady-state definition (what "normal" looks like)
- Required: Hypothesis to test (e.g., "API stays available if one pod crashes")
- Optional: Observability stack (Prometheus, Grafana) to measure impact
- Optional: Rollback plan
Related skills