spark-optimization

Installation
Summary

Apache Spark job optimization through partitioning, memory tuning, shuffle reduction, and join strategies.

  • Covers partitioning strategies, broadcast joins, bucketed joins, and skew handling with salting techniques to minimize shuffle overhead
  • Includes caching and persistence patterns with storage level selection, checkpointing for complex lineages, and memory configuration breakdown
  • Provides data format optimization for Parquet and Delta Lake, column pruning, predicate pushdown, and Z-ordering for multi-dimensional queries
  • Features monitoring and debugging patterns including query plan analysis, stage metrics tracking, and partition skew detection
  • Adaptive Query Execution (AQE) configuration and production-ready settings for executor memory, parallelism, serialization, and compression
SKILL.md

Apache Spark Optimization

Production patterns for optimizing Apache Spark jobs including partitioning strategies, memory management, shuffle optimization, and performance tuning.

When to Use This Skill

  • Optimizing slow Spark jobs
  • Tuning memory and executor configuration
  • Implementing efficient partitioning strategies
  • Debugging Spark performance issues
  • Scaling Spark pipelines for large datasets
  • Reducing shuffle and data skew

Core Concepts

1. Spark Execution Model

Driver Program
Related skills

More from wshobson/agents

Installs
6.1K
Repository
wshobson/agents
GitHub Stars
35.2K
First Seen
Jan 20, 2026