debugging-dags
Systematic root cause analysis and remediation for failed Airflow DAGs with structured investigation workflows.
- Guides through four-step diagnosis process: identify the failure, extract error details, gather contextual information, and deliver actionable remediation steps
- Categorizes failures into four types (data, code, infrastructure, dependency) to focus investigation and suggest appropriate fixes
- Provides ready-to-use CLI commands for log retrieval, run comparison, task clearing, and DAG rerun operations
- Includes Astro-specific tools (deployment activity logs, observability dashboards, alerts) and OSS Airflow inspection methods for different deployment contexts
DAG Diagnosis
You are a data engineer debugging a failed Airflow DAG. Follow this systematic approach to identify the root cause and provide actionable remediation.
Running the CLI
These commands assume af is on PATH. Run via astro otto to get it automatically, or install standalone with uv tool install astro-airflow-mcp.
Step 1: Identify the Failure
If a specific DAG was mentioned:
- Run
af runs diagnose <dag_id> <dag_run_id>(if run_id is provided) - If no run_id specified, run
af dags statsto find recent failures
If no DAG was specified:
- Run
af healthto find recent failures across all DAGs - Check for import errors with
af dags errors
More from astronomer/agents
airflow
Queries, manages, and troubleshoots Apache Airflow using the af CLI. Covers listing DAGs, triggering runs, reading task logs, diagnosing failures, debugging DAG import errors, checking connections, variables, pools, and monitoring health. Also routes to sub-skills for writing DAGs, debugging, deploying, and migrating Airflow 2 to 3. Use when user mentions "Airflow", "DAG", "DAG run", "task log", "import error", "parse error", "broken DAG", or asks to "trigger a pipeline", "debug import errors", "check Airflow health", "list connections", "retry a run", or any Airflow operation. Do NOT use for warehouse/SQL analytics on Airflow metadata tables — use analyzing-data instead.
814analyzing-data
Queries data warehouse and answers business questions about data. Handles questions requiring database/warehouse queries including "who uses X", "how many Y", "show me Z", "find customers", "what is the count", data lookups, metrics, trends, or SQL analysis.
755authoring-dags
Workflow and best practices for writing Apache Airflow DAGs. Use when the user wants to create a new DAG, write pipeline code, or asks about DAG patterns and conventions. For testing and debugging DAGs, see the testing-dags skill.
704migrating-airflow-2-to-3
Guide for migrating Apache Airflow 2.x projects to Airflow 3.x. Use when the user mentions Airflow 3 migration, upgrade, compatibility issues, breaking changes, or wants to modernize their Airflow codebase. If you detect Airflow 2.x code that needs migration, prompt the user and ask if they want you to help upgrade. Always load this skill as the first step for any migration-related request.
698testing-dags
Complex DAG testing workflows with debugging and fixing cycles. Use for multi-step testing requests like "test this dag and fix it if it fails", "test and debug", "run the pipeline and troubleshoot issues". For simple test requests ("test dag", "run dag"), the airflow entrypoint skill handles it directly. This skill is for iterative test-debug-fix cycles.
681tracing-upstream-lineage
Trace upstream data lineage. Use when the user asks where data comes from, what feeds a table, upstream dependencies, data sources, or needs to understand data origins.
667