senior-data-engineer
Installation
SKILL.md
Senior Data Engineer
Generate pipeline configurations (Airflow, Prefect, Dagster), validate data quality with profiling and anomaly detection, and optimize SQL/Spark performance with actionable recommendations.
Core Capabilities
- Pipeline generation — Airflow/Prefect/Dagster DAG code for batch and incremental loads, with DAG validation.
- Data quality — schema validation, profiling, anomaly detection, data contracts, and Great Expectations suite generation.
- ETL/ELT optimization — SQL and Spark analysis, partition strategy, and query cost estimation per warehouse.
- Architecture decisions — batch vs streaming and warehouse vs lakehouse trade-off frameworks.
- Reliability patterns — incremental watermarks, dead letter queues, freshness checks, and schema-drift detection.
When to Use
- Designing a data architecture or choosing batch vs streaming / warehouse vs lakehouse.
- Building or generating Airflow/Spark/dbt pipelines.
- Adding data-quality checks or data contracts.
- Optimizing slow ETL/ELT queries or troubleshooting pipeline failures.