observability-and-instrumentation
Installation
SKILL.md
Observability and Instrumentation
Overview
Code you can't observe is code you can't operate. Observability is the ability to answer "what is the system doing and why?" from the outside, using the telemetry the code emits. Instrumentation is not a post-launch add-on — it's written alongside the feature, the same way tests are. If a feature ships without telemetry, the first user-reported bug becomes archaeology instead of a query.
When to Use
- Building any feature that will run in production
- Adding a new service, endpoint, background job, or external integration
- A production incident took too long to diagnose ("we couldn't tell what happened")
- Setting up or reviewing alerting rules
- Reviewing a PR that adds I/O, retries, queues, or cross-service calls
NOT for:
- Diagnosing a failure happening right now — use the
debugging-and-error-recoveryskill (observability is what makes that skill fast next time) - Profiling and optimizing measured slowness — use the
performance-optimizationskill - Launch-day monitoring checklists and rollback triggers — see the
shipping-and-launchskill; this skill covers the instrumentation that feeds them