data-engineering-streaming
Streaming Data Systems
Real-time data ingestion and stream processing with Apache Kafka, MQTT, and NATS JetStream. Covers producers, consumers, and stream processing patterns for data engineering pipelines.
Quick Comparison
| Feature | Apache Kafka | MQTT | NATS JetStream |
|---|---|---|---|
| Use Case | High-throughput event streaming | IoT, mobile, constrained devices | Cloud-native, microservices |
| Throughput | Millions/sec | Thousands/sec | Hundreds of thousands/sec |
| Durability | Disk-based log, replayable | Ephemeral (configurable) | Disk-based persistence |
| Ordering | Per-partition | N/A (topic-based) | Per-subject |
| Python Client | confluent-kafka | paho-mqtt | nats-py |
| Best For | Event sourcing, CDC, log aggregation | Sensor data, telemetry | Service-to-service messaging |
When to Use Which?
- Kafka: High-volume event streams, log aggregation, CDC, data lake ingestion
- MQTT: IoT devices, mobile push, constrained networks
More from legout/data-platform-agent-skills
data-science-eda
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
13data-science-visualization
Data visualization for Python: Matplotlib, Seaborn, Plotly, Altair, hvPlot/HoloViz, and Bokeh. Use when creating exploratory charts, interactive dashboards, publication-quality figures, or choosing the right library for your data and audience.
12data-engineering-core
Core Python data engineering: Polars, DuckDB, PyArrow, PostgreSQL, ETL patterns, performance tuning, and resilient pipeline construction. Use when building or reviewing batch ETL/dataframe/SQL pipelines in Python.
10data-science-feature-engineering
Feature engineering for machine learning: encoding, scaling, transformations, datetime features, text features, and feature selection. Use when preparing data for modeling or improving model performance through better representations.
10data-science-notebooks
Interactive notebooks for data science: Jupyter, JupyterLab, and marimo. Use for exploratory analysis, reproducible research, documentation, and sharing insights with stakeholders.
9data-engineering-best-practices
Data engineering best practices: medallion architecture, dataset lifecycle, partitioning, file sizing, schema evolution, and append/overwrite/merge patterns across Polars, PyArrow, DuckDB, Delta Lake, and Iceberg. Use when designing production data pipelines or reviewing data platform decisions.
8