databricks-autonomous-operations
Installation
SKILL.md
Databricks Autonomous Operations
1. Overview
This skill is both an SDK/CLI/Connect reference and an autonomous operations playbook. It teaches the AI agent to operate as an SRE — independently deploying, monitoring, diagnosing failures, applying fixes, redeploying, and verifying results across all Databricks resource types.
Core Loop: Deploy → Poll → Diagnose → Fix → Redeploy → Verify (max 3 iterations before escalation to user).
When to Activate This Skill
- Deploying or running Databricks Asset Bundles (
bundle deploy,bundle run) - Monitoring job or pipeline runs for completion
- Troubleshooting ANY failure: jobs, DLT pipelines, monitors, alerts, clusters, Genie Spaces
- Using the Databricks Python SDK, CLI, Connect, or REST API
- Encountering error messages from Databricks services
- Operating in a self-healing deploy-fix-redeploy cycle
- Checking job/task/pipeline status or retrieving run output