observability
Observability
If a boundary matters to delivery, debugging, cost, or safety, instrument it deliberately and name the signal contract explicitly.
Context
This skill covers cross-cutting instrumentation design, not just runtime dashboards. Use it when the system needs durable telemetry contracts for:
- application events and structured logs
- metrics and traces at important boundaries
- AI execution telemetry such as skill invocation, runner execution, model usage, and token accounting
- workflow-level observability that must survive handoff across phases
This skill is intentionally distinct from monitoring-observability:
observabilitydefines what signals should exist and how they are structuredmonitoring-observabilityturns important production signals into dashboards, alerts, and responder workflows
The repository's current execution contract uses append-only execution-observability.jsonl artifacts plus periodic summaries from scripts/summarize_execution_observability.py. That keeps the runtime loop concrete without committing to a heavier backend too early.
More from yknothing/prodcraft
system-design
Use when reviewed requirements or specifications are ready and the team must decide high-level architecture, component boundaries, integration seams, or brownfield coexistence strategy before API design, technology selection, or task planning.
6ci-cd
Use when a reviewed implementation slice needs an automated build, test, and deployment pipeline, especially when brownfield rollback, release-boundary checks, contract/integration gates, and staged delivery must be explicit before shipping.
6intake
The mandatory gateway for all new engineering work. Triage and route new products, apps, features, migrations, tech-debt, or any 'not sure where to start' request to the correct lifecycle path. Use before starting design or implementation. Do not use for ongoing tasks, specific debugging, or PR reviews.
6feature-development
Use when a reviewed task slice has tests or acceptance targets and the team must turn it into a small, mergeable implementation increment without expanding scope, breaking contracts, or hiding release-boundary risk.
6monitoring-observability
Use when a live service or newly delivered release needs actionable telemetry, dashboards, and alerts that expose real user-impactful boundaries, especially when brownfield coexistence rules, unsupported-flow safety, rollback health, or queue/backfill behavior must be visible before incidents escalate.
6incident-response
Use when a live production issue needs coordinated containment, severity triage, stakeholder communication, and evidence capture, especially when a recent release, brownfield coexistence rules, rollback decisions, or unresolved contract boundaries must be handled before root-cause work.
6