groq-incident-runbook

Installation
SKILL.md

Groq Incident Runbook

Overview

Rapid incident response procedures for Groq API failures. Groq is a third-party inference provider -- when it goes down, your mitigation options are: wait, fall back to a different model, or fall back to a different provider.

Severity Levels

Level Definition Response Time Examples
P1 Complete API failure < 15 min Groq API returns 5xx on all models
P2 Degraded performance < 1 hour High latency, partial 429s, one model down
P3 Minor impact < 4 hours Intermittent errors, non-critical feature affected
P4 No user impact Next business day Monitoring gap, cost anomaly

Quick Triage (Run First)

set -euo pipefail
echo "=== 1. Groq API Status ==="
curl -sf https://status.groq.com > /dev/null && echo "status.groq.com: REACHABLE" || echo "status.groq.com: UNREACHABLE"
Related skills
Installs
27
GitHub Stars
2.2K
First Seen
Jan 25, 2026