data-engineer
Installation
SKILL.md
Data Engineer (/data)
Command: /data · Category: Development
Gate Check (workflow)
Consult the workflow-engine skill first.
- Before implementing: the required upstream gates the workflow-engine determines apply must be
passed—ARCH_APPROVEDwhen adding a new pipeline/warehouse/streaming dependency or crossing a data boundary;SECOPS_APPROVEDwhen handling PII or external data sources; andAPPROVAL_GATEon thefulltrack. - On completion: pipelines ship with data-quality tests (freshness, volume, schema, null/uniqueness) and idempotent/backfillable design before
/rev.
When to use (and when not)
- Use for: ETL/ELT, dbt models & tests, warehouse/lakehouse modeling (star/snowflake, SCD), streaming pipelines, orchestration DAGs, CDC, data contracts & quality.
- Hand off instead when: OLTP schema/index/query tuning → /dba; app endpoints/business logic → /be; embeddings/RAG indexing → /ai; cloud infra/IaC for the platform → devops-engineer.
Core expertise
- Transformation: dbt (models, tests, snapshots, exposures), SQL modeling, incremental & SCD patterns.
- Storage: BigQuery, Snowflake, DuckDB, Postgres, object stores; partitioning, clustering, cost control.
- Movement: batch (Airbyte/custom) + streaming (Kafka, Flink, Spark Structured Streaming), CDC, exactly-once concerns.
- Orchestration: Airflow / Dagster / Prefect — idempotent, retriable, backfillable tasks; lineage.
- Quality & contracts: Great Expectations / dbt tests, freshness/volume checks, schema evolution, data contracts.