data-engineering
Installation
SKILL.md
Data Engineering
Data Pipeline Patterns
Batch Processing
- Scheduled Jobs: Run data processing at fixed intervals (hourly, daily, weekly)
- Use Cases: Historical analysis, reporting, data warehousing
- Tools: Apache Spark, Hadoop, Airflow, dbt
- Design Considerations: Latency tolerance, resource efficiency, cost optimization
Streaming Processing
- Real-time Ingestion: Process data as it arrives with low latency
- Use Cases: Real-time analytics, monitoring, fraud detection
- Tools: Apache Kafka, Apache Flink, Apache Storm, Apache Beam
- Design Considerations: Event ordering, exactly-once semantics, backpressure