spark
Apache Spark
Spark is the king of Big Data. v4.0 (2024/2025) makes Spark Connect the default, allowing thin clients (like VS Code) to connect to massive clusters easily.
When to Use
- Data Engineering: ETL at Petabyte scale.
- Streaming: Structured Streaming for real-time analytics.
- Legacy ML:
spark.ml(though mostly replaced by XGBoost/Torch).
Core Concepts
Spark Connect
Decouples client (your laptop) from server (the cluster). Allows using Spark from Go/Rust/TypeScript.
Catalyst Optimizer
Optimizes your SQL/DataFrame queries before execution.
More from g1joshi/agent-skills
template
Expert [skill-name] assistance covering [feature 1], [feature 2], and [feature 3]. Use when [working with X], [debugging Y], or [implementing Z].
34mariadb
MariaDB MySQL-compatible database with Galera clustering. Use for MySQL-compatible database needs.
6claude
Anthropic Claude AI models for analysis and coding. Use for AI assistants.
5javascript
JavaScript ES6+ programming including async/await, DOM manipulation, modules, and Node.js. Use for .js files and web development.
4typescript
TypeScript static typing with interfaces, generics, decorators, and type inference. Use for .ts files.
4python
Python programming with type hints, async/await, decorators, and package management. Use for .py files and data science.
4