monitoring-observability
Monitoring & Observability
Use this skill when the main question is "what packet do we have, what should this system notice, and what should interrupt a human?"
The job is not to dump a Prometheus / Grafana / Datadog tutorial. The job is to normalize the packet, pick one primary observability mode, define the smallest useful signal plan, and route adjacent work away before the skill turns into debugging, performance tuning, rollout execution, or analytics reporting.
Read references/intake-packets-and-route-outs.md before handling an unfamiliar packet. Read references/modes-and-boundaries.md before handling mixed requests that blur telemetry setup, incident diagnosis, or product analytics. Read references/alert-dashboard-checklist.md when reviewing dashboards, alerts, and ownership gaps. Read references/telemetry-rollout-matrix.md when choosing the smallest rollout slice.
When to use this skill
- New service, worker, API, or multi-service system needs health signals, alerts, dashboards, or SLO-style coverage before launch
- Existing stack has dashboards / alerts / telemetry, but trust is low and a keep/fix/delete/add audit is needed
- Team needs to decide what to instrument, correlate, retain, or sample before choosing vendors or backend specifics
- Data, marketing, analytics, or pipeline work needs freshness / schema / volume / lineage monitoring rather than another manual trust check
- Game or live-ops work needs crash, session, build, or launch-event visibility without turning into engine-profiler interpretation
- Cross-functional reliability asks span backend, product/ops, marketing pipelines, and game live-ops, and the next owner is still unclear