apache-spark
Installation
SKILL.md
Apache Spark
Overview
Apache Spark is the standard for distributed data processing. It handles batch processing, streaming, SQL, machine learning, and graph processing. PySpark provides a Python API. Runs on standalone clusters, YARN, Kubernetes, or managed services (Databricks, EMR, Dataproc).
Instructions
Step 1: PySpark Setup
pip install pyspark
Step 2: DataFrame Operations
# etl/process.py — PySpark data processing
from pyspark.sql import SparkSession
Related skills