data-engineering

Installation
SKILL.md

Data Engineering

Build scalable data pipelines and infrastructure for big data processing.

Quick Start with Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, sum, count

# Initialize Spark
spark = SparkSession.builder \
    .appName("DataProcessing") \
    .config("spark.executor.memory", "4g") \
    .getOrCreate()

# Read data
df = spark.read.parquet("s3://bucket/data/")
Related skills
Installs
24
GitHub Stars
4
First Seen
Jan 24, 2026