big-data
Installation
SKILL.md
Big Data & Distributed Computing
Production-grade big data processing with Apache Spark, distributed systems patterns, and petabyte-scale data engineering.
Quick Start
# PySpark 3.5+ modern DataFrame API
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.sql.window import Window
# Initialize Spark with optimal settings
spark = (SparkSession.builder
.appName("ProductionETL")
.config("spark.sql.adaptive.enabled", "true")
.config("spark.sql.adaptive.coalescePartitions.enabled", "true")
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.getOrCreate())
Related skills
More from pluginagentmarketplace/custom-plugin-data-engineer
statistics-math
Statistics, probability, linear algebra, and mathematical foundations for data science
327deep-learning
PyTorch, TensorFlow, neural networks, CNNs, transformers, and deep learning for production
48python-programming
Master Python fundamentals, OOP, data structures, async programming, and production-grade scripting for data engineering
31data-engineering
Data pipeline architecture, ETL/ELT patterns, data modeling, and production data platform design
29etl-tools
Apache Airflow, dbt, Prefect, Dagster, and modern data orchestration for production data pipelines
28git-version-control
Git workflows, branching strategies, collaboration, and code management
27