production-monitoring

Installation
SKILL.md

Production Monitoring and Observability

This skill encodes battle-tested observability patterns for production services. Every recommendation comes from real incidents — the ones where you stared at a dashboard that showed nothing useful while users were screaming. Observability is not a feature you bolt on after launch. It is the foundation you build on from day one.


1. The Three Pillars of Observability

Observability is not "having logs." It is the ability to ask arbitrary questions about your system's behavior without deploying new code. The three pillars work together — none is sufficient alone.

Pillar What It Tells You Example
Logs What happened — discrete events with context "User X login failed: expired token"
Metrics How the system behaves now — aggregated numbers over time "p99 latency is 450ms and rising"
Traces Why something is slow — a request's journey across services "Postgres query in user-service took 2.3s"

How they connect: An alert fires on a metric (error rate > 1%). You filter logs by the time window to see what errors occurred. You grab a trace ID from the logs and follow the trace to the slow service. You fix it and verify the metric recovers. Without all three, you are flying blind.


Related skills

More from vstorm-co/production-stack-skills

Installs
1
GitHub Stars
14
First Seen
1 day ago