sota-observability

Installation
SKILL.md

SOTA Observability & Reliability

Purpose

Make every production system answerable. Two questions define success:

  1. "Why is this request slow/failing?" — answerable for any single request from a trace ID, without adding new instrumentation.
  2. "What broke at 3am?" — answerable from symptom-based alerts that page only when users are hurt, each linked to a runbook and a dashboard that narrows cause in minutes.

This skill covers structured logging, metrics, distributed tracing, SLOs and alerting, and operational readiness — both how to build them correctly and how to audit them adversarially. Telemetry is a product with users (on-call engineers) and costs (storage, cardinality, attention). Treat both.

BUILD mode

Installs
1
First Seen
8 days ago
sota-observability — martinholovsky/sota-skills