Monitoring & Observability

Use this skill when the main question is "what packet do we have, what should this system notice, and what should interrupt a human?"

The job is not to dump a Prometheus / Grafana / Datadog tutorial. The job is to normalize the packet, pick one primary observability mode, define the smallest useful signal plan, and route adjacent work away before the skill turns into debugging, performance tuning, rollout execution, or analytics reporting.

Read references/intake-packets-and-route-outs.md before handling an unfamiliar packet. Read references/modes-and-boundaries.md before handling mixed requests that blur telemetry setup, incident diagnosis, or product analytics. Read references/alert-dashboard-checklist.md when reviewing dashboards, alerts, and ownership gaps. Read references/telemetry-rollout-matrix.md when choosing the smallest rollout slice.

When to use this skill

New service, worker, API, or multi-service system needs health signals, alerts, dashboards, or SLO-style coverage before launch
Existing stack has dashboards / alerts / telemetry, but trust is low and a keep/fix/delete/add audit is needed
Team needs to decide what to instrument, correlate, retain, or sample before choosing vendors or backend specifics
Data, marketing, analytics, or pipeline work needs freshness / schema / volume / lineage monitoring rather than another manual trust check
Game or live-ops work needs crash, session, build, or launch-event visibility without turning into engine-profiler interpretation
Cross-functional reliability asks span backend, product/ops, marketing pipelines, and game live-ops, and the next owner is still unclear

monitoring-observability

Monitoring & Observability

When to use this skill