data-engineering
Installation
SKILL.md
Data Engineering
Build scalable data pipelines and infrastructure for big data processing.
Quick Start with Apache Spark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg, sum, count
# Initialize Spark
spark = SparkSession.builder \
.appName("DataProcessing") \
.config("spark.executor.memory", "4g") \
.getOrCreate()
# Read data
df = spark.read.parquet("s3://bucket/data/")