production-monitoring
Production Monitoring and Observability
This skill encodes battle-tested observability patterns for production services. Every recommendation comes from real incidents — the ones where you stared at a dashboard that showed nothing useful while users were screaming. Observability is not a feature you bolt on after launch. It is the foundation you build on from day one.
1. The Three Pillars of Observability
Observability is not "having logs." It is the ability to ask arbitrary questions about your system's behavior without deploying new code. The three pillars work together — none is sufficient alone.
| Pillar | What It Tells You | Example |
|---|---|---|
| Logs | What happened — discrete events with context | "User X login failed: expired token" |
| Metrics | How the system behaves now — aggregated numbers over time | "p99 latency is 450ms and rising" |
| Traces | Why something is slow — a request's journey across services | "Postgres query in user-service took 2.3s" |
How they connect: An alert fires on a metric (error rate > 1%). You filter logs by the time window to see what errors occurred. You grab a trace ID from the logs and follow the trace to the slow service. You fix it and verify the metric recovers. Without all three, you are flying blind.
More from vstorm-co/production-stack-skills
production
Main orchestrator for the production-stack-skills pack. Routes /production subcommands to specialized skills. Use this skill when the user types /production followed by a subcommand (check, fastapi, postgres, docker, deploy, monitoring, security, errors, report, score). Also triggers when user says 'make this production ready', 'productionize this', or asks about production readiness in general.
1production-check
Full production readiness audit with 0-100 score — scans the entire project across security, error handling, observability, deployment readiness, database patterns, and container hygiene. Launches parallel analysis, classifies findings by severity, and produces a prioritized action plan. Use this skill when user says /production check, /production score, asks 'is this production ready', 'audit this project', 'how production ready is this', or wants a comprehensive codebase health check.
1production-review
Production-readiness code review that checks for security vulnerabilities, error handling, logging, configuration, performance, and operational concerns. Use this skill when the user asks for a code review, PR review, quality check, production readiness check, or says 'review this', 'is this production ready', 'check my code'. Also trigger when reviewing pull requests that touch backend services, APIs, or infrastructure code. Works with Python, Node.js, Go, and Java codebases.
1production-fastapi
Production-grade FastAPI patterns — structured logging, health checks, graceful shutdown, middleware, Pydantic v2, async patterns, error handling, and security hardening. Use this skill when the user is building or modifying a FastAPI application, working with Pydantic models, configuring Starlette middleware, setting up Uvicorn/Gunicorn, or asks about FastAPI best practices. Triggers when importing fastapi, starlette, pydantic, or uvicorn. Also trigger when user says /production fastapi. DO NOT trigger for Django or Flask unless explicitly asked.
1production-docker
Docker production hardening — multi-stage builds, non-root users, distroless images, BuildKit secrets, layer optimization, security scanning, and compose best practices. Use this skill when the user is creating or modifying Dockerfiles, docker-compose files, .dockerignore, or containerizing applications. Triggers on any Dockerfile, docker-compose.yml, .dockerignore, or when user mentions Docker, containers, or images. Also trigger when user says /production docker.
1production-postgres
PostgreSQL production patterns — safe migrations, indexing strategies, connection pooling, schema design, and query optimization. Use this skill when the user is working with PostgreSQL, writing database migrations (Alembic, Django migrations, raw SQL), designing database schemas, optimizing queries, setting up connection pooling (PgBouncer, asyncpg), or asks about database best practices. Triggers on SQL files, migration files, SQLAlchemy models, Django models, or Prisma schemas that target PostgreSQL.
1