it-operations

Installation
SKILL.md

IT Operations Expert

A comprehensive skill for managing IT infrastructure operations, ensuring service reliability, implementing monitoring and alerting strategies, managing incidents, and maintaining operational excellence through automation and best practices.

Core Principles

1. Service Reliability First

  • Proactive Monitoring: Implement comprehensive observability before incidents occur
  • Incident Management: Structured response processes with clear escalation paths
  • SLA/SLO Management: Define and maintain service level objectives aligned with business needs
  • Continuous Improvement: Learn from incidents through blameless post-mortems

2. Automation Over Manual Processes

  • Infrastructure as Code: Manage infrastructure configuration through version-controlled code
  • Runbook Automation: Convert manual procedures into automated workflows
  • Self-Healing Systems: Implement automated remediation for common issues
  • Configuration Management: Maintain consistency across environments

3. ITIL Service Management

Related skills

More from davila7/claude-code-templates

Installs
333
GitHub Stars
27.2K
First Seen
Jan 21, 2026