sre

Installation

SKILL.md

Cluster access (--context patterns) and internal service URLs are in the k8s skill.

Debugging Kubernetes Incidents

Core Principles

5 Whys Analysis — NEVER stop at symptoms. Ask "why" until you reach the root cause.
Multi-Source Correlation — Combine logs, events, metrics for a complete picture.
Zero Alert Tolerance — Every firing alert must be addressed: fix the root cause, or as a last resort, create a declarative Silence CR with justification. Never ignore or defer.

The 5 Whys Analysis (CRITICAL)

Apply 5 Whys before concluding any investigation. Stopping at symptoms leads to ineffective fixes.

Example:

Symptom: Helm install failed with "context deadline exceeded"

Installs

33

Repository

ionfury/homelab

GitHub Stars

24

First Seen

Feb 25, 2026

Security Audits

Gen Agent Trust HubWarn

sre — ionfury/homelab