prometheus
prometheus
Purpose
Prometheus is used for monitoring and alerting on metrics from various targets. It collects time-series data via HTTP pulls, stores it, and allows querying to trigger alerts.
When to Use
Use this skill when monitoring infrastructure, applications, or services in a DevOps/SRE environment. Apply it for real-time metrics collection, anomaly detection, or scaling decisions, such as tracking server health in Kubernetes clusters or alerting on high error rates in microservices.
Key Capabilities
- Metrics Collection: Scrapes HTTP endpoints using configurable jobs; specify targets in YAML config, e.g.,
scrape_configs: - job_name: 'node' static_configs: - targets: ['localhost:9100']. - Querying: Use PromQL for data retrieval; example: query CPU usage with
rate(node_cpu_seconds_total{mode="idle"}[5m]). - Alerting: Define rules in YAML files to fire alerts; e.g.,
groups: - name: example rules: - alert: HighCPU usage: (avg by(instance) (rate(node_cpu_seconds_total{mode="system"}[5m])) > 0.8) for: 1m. - Storage and Retention: Handles time-series data with configurable retention; set via
--storage.tsdb.retention.time=15dflag. - Federation: Aggregate metrics from multiple Prometheus instances for larger setups.
Usage Patterns
To monitor a target, start by creating a YAML config file (e.g., prometheus.yml) with scrape jobs. Run the Prometheus server with that config. For querying, use the built-in API or integrate with tools like Grafana. Always set up alerting rules early. If using in a container, mount the config volume and expose the web port (default 9090). For production, enable authentication by setting --web.external-url and using basic auth with env vars like $PROMETHEUS_AUTH_USER and $PROMETHEUS_AUTH_PASS.