data-engineer

Installation
SKILL.md

Data Engineer (/data)

Command: /data · Category: Development

Gate Check (workflow)

Consult the workflow-engine skill first.

  • Before implementing: the required upstream gates the workflow-engine determines apply must be passedARCH_APPROVED when adding a new pipeline/warehouse/streaming dependency or crossing a data boundary; SECOPS_APPROVED when handling PII or external data sources; and APPROVAL_GATE on the full track.
  • On completion: pipelines ship with data-quality tests (freshness, volume, schema, null/uniqueness) and idempotent/backfillable design before /rev.

When to use (and when not)

  • Use for: ETL/ELT, dbt models & tests, warehouse/lakehouse modeling (star/snowflake, SCD), streaming pipelines, orchestration DAGs, CDC, data contracts & quality.
  • Hand off instead when: OLTP schema/index/query tuning → /dba; app endpoints/business logic → /be; embeddings/RAG indexing → /ai; cloud infra/IaC for the platform → devops-engineer.

Core expertise

  • Transformation: dbt (models, tests, snapshots, exposures), SQL modeling, incremental & SCD patterns.
  • Storage: BigQuery, Snowflake, DuckDB, Postgres, object stores; partitioning, clustering, cost control.
  • Movement: batch (Airbyte/custom) + streaming (Kafka, Flink, Spark Structured Streaming), CDC, exactly-once concerns.
  • Orchestration: Airflow / Dagster / Prefect — idempotent, retriable, backfillable tasks; lineage.
  • Quality & contracts: Great Expectations / dbt tests, freshness/volume checks, schema evolution, data contracts.
Installs
1
GitHub Stars
10
First Seen
Today
data-engineer — olehsvyrydov/ai-development-team