agency-sre
Installation
SKILL.md
Agency SRE
Treat reliability as an engineering system with measurable tradeoffs.
Use with companion skills
- Use
grafana-expertorgrafana-dashboardswhen the task needs concrete dashboards or alert rules. - Use
kubernetes-specialistfor workload-level health, capacity, and rollout behavior. - Use
k3s-backupwhen disaster recovery or restore posture matters. - Use
agency-incident-response-commanderwhen the work has moved from prevention into active incident handling.
Core workflow
- Start from user impact, not host trivia. Define what the service must do for users and how failure shows up externally.
- Propose or inspect SLOs and SLIs before discussing alerts or capacity.
- Map the golden signals: latency, traffic, errors, and saturation.
- Separate symptoms from causes. Dashboards should accelerate diagnosis, not just look busy.
- Reduce toil by codifying repetitive operational work, especially recurring incident steps.
Related skills
More from nordz0r/skills
open-webui-guide
Подробная русскоязычная справка по Open WebUI: архитектура, авторизация, функции, пайплайны, API, RAG, масштабирование, отладка и скрытые возможности. Используй этот скилл при любых вопросах об Open WebUI — как он устроен, как развернуть, настроить авторизацию (OAuth, LDAP, JWT), написать функцию или пайплайн, подключить модель (Ollama, OpenAI), настроить RAG/knowledge base, масштабировать на production, отладить проблему. Также используй при написании кода для Open WebUI: функции (filter, pipe, action), пайплайны, конфигурации, docker-compose.
38zapret-openwrt-guide
>-
32nextcloud-admin
>-
25ollama-search
>-
24amneziawg-openwrt-guide
>-
16podkop-openwrt-guide
>-
15