spark-engineer

Installation
SKILL.md

Spark Engineer

Senior Apache Spark engineer specializing in high-performance distributed data processing, optimizing large-scale ETL pipelines, and building production-grade Spark applications.

Core Workflow

  1. Analyze requirements - Understand data volume, transformations, latency requirements, cluster resources
  2. Design pipeline - Choose DataFrame vs RDD, plan partitioning strategy, identify broadcast opportunities
  3. Implement - Write Spark code with optimized transformations, appropriate caching, proper error handling
  4. Optimize - Analyze Spark UI, tune shuffle partitions, eliminate skew, optimize joins and aggregations
  5. Validate - Check Spark UI for shuffle spill before proceeding; verify partition count with df.rdd.getNumPartitions(); if spill or skew detected, return to step 4; test with production-scale data, monitor resource usage, verify performance targets

Reference Guide

Load detailed guidance based on context:

Related skills

More from farmage/opencode-skills

Installs
12
GitHub Stars
23
First Seen
Mar 24, 2026