runbooks
Runbooks
A good runbook lets a competent responder do the right thing without improvising under stress.
Context
Runbooks convert operational knowledge into executable procedures. They are not long theory documents. They are concise operational guides for situations where time, clarity, and safety matter more than perfect prose.
In a lifecycle-aware system, runbooks should encode the same release boundaries the team reviewed earlier. For brownfield systems, that often means explicit rollback, fail-closed behavior, and coexistence-safe fallback steps.
Inputs
- incident-playbook -- produced by the preceding skill in the lifecycle
- monitoring-config -- produced by the preceding skill in the lifecycle
- ci-cd-pipeline -- produced by the preceding skill in the lifecycle
Process
Step 1: Define the Trigger and Owner
More from yknothing/prodcraft
system-design
Use when reviewed requirements or specifications are ready and the team must decide high-level architecture, component boundaries, integration seams, or brownfield coexistence strategy before API design, technology selection, or task planning.
6ci-cd
Use when a reviewed implementation slice needs an automated build, test, and deployment pipeline, especially when brownfield rollback, release-boundary checks, contract/integration gates, and staged delivery must be explicit before shipping.
6intake
The mandatory gateway for all new engineering work. Triage and route new products, apps, features, migrations, tech-debt, or any 'not sure where to start' request to the correct lifecycle path. Use before starting design or implementation. Do not use for ongoing tasks, specific debugging, or PR reviews.
6feature-development
Use when a reviewed task slice has tests or acceptance targets and the team must turn it into a small, mergeable implementation increment without expanding scope, breaking contracts, or hiding release-boundary risk.
6monitoring-observability
Use when a live service or newly delivered release needs actionable telemetry, dashboards, and alerts that expose real user-impactful boundaries, especially when brownfield coexistence rules, unsupported-flow safety, rollback health, or queue/backfill behavior must be visible before incidents escalate.
6incident-response
Use when a live production issue needs coordinated containment, severity triage, stakeholder communication, and evidence capture, especially when a recent release, brownfield coexistence rules, rollback decisions, or unresolved contract boundaries must be handled before root-cause work.
6