enterprise-agent-ops

Installation
Summary

Operational controls for long-lived agent workloads with observability, security, and lifecycle management.

  • Covers four operational domains: runtime lifecycle (start, pause, stop, restart), observability (logs, metrics, traces), safety controls (scopes, permissions, kill switches), and change management (rollout, rollback, audit)
  • Enforces baseline controls including immutable deployment artifacts, least-privilege credentials, environment-level secret injection, and hard timeouts with retry budgets
  • Tracks key metrics: success rate, mean retries per task, time to recovery, cost per task, and failure class distribution
  • Includes structured incident response: freeze rollouts, capture traces, isolate failures, patch safely, run regression and security checks, then resume gradually
  • Integrates with PM2, systemd, container orchestrators, and CI/CD gates
SKILL.md

Enterprise Agent Ops

Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.

Operational Domains

  1. runtime lifecycle (start, pause, stop, restart)
  2. observability (logs, metrics, traces)
  3. safety controls (scopes, permissions, kill switches)
  4. change management (rollout, rollback, audit)

Baseline Controls

  • immutable deployment artifacts
  • least-privilege credentials
  • environment-level secret injection
  • hard timeout and retry budgets
  • audit log for high-risk actions
Related skills
Installs
3.1K
GitHub Stars
179.7K
First Seen
Mar 5, 2026