data-engineering-streaming

Installation

SKILL.md

Streaming Data Systems

Real-time data ingestion and stream processing with Apache Kafka, MQTT, and NATS JetStream. Covers producers, consumers, and stream processing patterns for data engineering pipelines.

Quick Comparison

Feature	Apache Kafka	MQTT	NATS JetStream
Use Case	High-throughput event streaming	IoT, mobile, constrained devices	Cloud-native, microservices
Throughput	Millions/sec	Thousands/sec	Hundreds of thousands/sec
Durability	Disk-based log, replayable	Ephemeral (configurable)	Disk-based persistence
Ordering	Per-partition	N/A (topic-based)	Per-subject
Python Client	confluent-kafka	paho-mqtt	nats-py
Best For	Event sourcing, CDC, log aggregation	Sensor data, telemetry	Service-to-service messaging

When to Use Which?

Kafka: High-volume event streams, log aggregation, CDC, data lake ingestion
MQTT: IoT devices, mobile push, constrained networks

Related skills

More from legout/data-platform-agent-skills

data-science-eda
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
13
data-science-visualization
Data visualization for Python: Matplotlib, Seaborn, Plotly, Altair, hvPlot/HoloViz, and Bokeh. Use when creating exploratory charts, interactive dashboards, publication-quality figures, or choosing the right library for your data and audience.
12
data-engineering-core
Core Python data engineering: Polars, DuckDB, PyArrow, PostgreSQL, ETL patterns, performance tuning, and resilient pipeline construction. Use when building or reviewing batch ETL/dataframe/SQL pipelines in Python.
10
data-science-feature-engineering
Feature engineering for machine learning: encoding, scaling, transformations, datetime features, text features, and feature selection. Use when preparing data for modeling or improving model performance through better representations.
10
data-science-notebooks
Interactive notebooks for data science: Jupyter, JupyterLab, and marimo. Use for exploratory analysis, reproducible research, documentation, and sharing insights with stakeholders.
9
data-engineering-best-practices
Data engineering best practices: medallion architecture, dataset lifecycle, partitioning, file sizing, schema evolution, and append/overwrite/merge patterns across Polars, PyArrow, DuckDB, Delta Lake, and Iceberg. Use when designing production data pipelines or reviewing data platform decisions.
8

Installs

5

Repository

legout/data-pla…t-skills

First Seen

Feb 11, 2026

Security Audits

Gen Agent Trust HubWarn