Observability

Overview

Code that runs fine in dev and goes inert in production is the dominant operational failure mode for modern services. When you add code that will run for users, you also add the diagnosability of that code: structured logs, trace context across process boundaries, metrics with bounded cardinality, signals an operator can read without your help.

This is a rigid skill. Jump to the sub-section that matches what you're writing and run that sub-section's checks.

These checks matter most when adding a request handler, RPC, or background job that will run in production with users depending on diagnosability. In MVPs, prototypes, internal dev tools, and one-off scripts, structured-logging, tracing, and SLO discipline are premature — prefer the simplest thing that works.

When to invoke

Invoke when you're about to:

observability

Observability

Overview

When to invoke