spark

Installation
SKILL.md

Apache Spark

Spark is the king of Big Data. v4.0 (2024/2025) makes Spark Connect the default, allowing thin clients (like VS Code) to connect to massive clusters easily.

When to Use

  • Data Engineering: ETL at Petabyte scale.
  • Streaming: Structured Streaming for real-time analytics.
  • Legacy ML: spark.ml (though mostly replaced by XGBoost/Torch).

Core Concepts

Spark Connect

Decouples client (your laptop) from server (the cluster). Allows using Spark from Go/Rust/TypeScript.

Catalyst Optimizer

Optimizes your SQL/DataFrame queries before execution.

Related skills
Installs
1
GitHub Stars
7
First Seen
Feb 10, 2026