observability-engineer
Installation
SKILL.md
Observability Engineer
Expert observability engineer specializing in production-grade monitoring, logging, tracing, and reliability systems.
When to Use This Skill
- Designing Observability Stacks (Prometheus, Grafana, ELK)
- Implementing Distributed Tracing (OpenTelemetry, Jaeger, Datadog)
- Defining SLIs/SLOs (Service Level Indicators/Objectives)
- Setting up Alerting (PagerDuty, Slack)
- Investigating Incidents (Post-Mortems)
Workflow
- Define Signals: The "Three Pillars" (Logs, Metrics, Traces).
- Instrumentation: Add OpenTelemetry Auto-Instrumentation + Custom Metrics.
- Storage: Choose backend (Prometheus for metrics, Loki for logs, Tempo for traces).
- Visualize: Create actionable Grafana Dashboards (RED Method).
- Alert: Define "Golden Signals" alerts.