clickhouse-incident-runbook
Installation
SKILL.md
ClickHouse Incident Runbook
Overview
Step-by-step procedures for triaging and resolving ClickHouse incidents using built-in system tables and SQL commands.
Severity Levels
| Level | Definition | Response | Examples |
|---|---|---|---|
| P1 | ClickHouse unreachable / all queries failing | < 15 min | Server down, OOM, disk full |
| P2 | Degraded performance / partial failures | < 1 hour | Slow queries, merge backlog |
| P3 | Minor impact / non-critical errors | < 4 hours | Single table issue, warnings |
| P4 | No user impact | Next business day | Monitoring gaps, optimization |
Quick Triage (Run First)
Related skills