tracing-upstream-lineage

Installation
Summary

Trace upstream data lineage to identify sources, DAGs, and dependencies feeding a table or column.

  • Supports tracing three target types: tables, columns, and DAGs; uses Airflow DAG source code and task inspection to find producing pipelines
  • Handles SQL sources (FROM clauses), external systems (S3, Postgres, Salesforce, HTTP APIs), and file-based sources; recursively traces upstream chains
  • Includes column-level tracing through direct mappings, transformations, and aggregations in DAG code
  • Generates lineage reports with diagrams, source details, transformation chains, and data quality implications
  • Leverages Astro's visual Lineage tab for quick exploration; falls back to manual DAG inspection for OSS Airflow
SKILL.md

Upstream Lineage: Sources

Trace the origins of data - answer "Where does this data come from?"

Lineage Investigation

Step 1: Identify the Target Type

Determine what we're tracing:

  • Table: Trace what populates this table
  • Column: Trace where this specific column comes from
  • DAG: Trace what data sources this DAG reads from

Step 2: Find the Producing DAG

Tables are typically populated by Airflow DAGs. Find the connection:

  1. Search DAGs by name: Use af dags list and look for DAG names matching the table name
    • load_customers -> customers table
Related skills

More from astronomer/agents

Installs
667
GitHub Stars
361
First Seen
Jan 23, 2026