spark-engineer
Expert Apache Spark engineer for distributed data processing, ETL pipeline optimization, and production-grade big data applications.
- Covers DataFrame API, Spark SQL, RDD operations, and structured streaming with explicit schema definitions and lazy evaluation patterns
- Provides partitioning strategies, broadcast join optimization, data skew handling via salting, and caching best practices for large-scale workloads
- Includes performance tuning guidance: shuffle partition configuration, memory management, Spark UI analysis, and executor resource allocation
- Enforces production constraints: schema validation, appropriate caching discipline, small file coalescing, and avoidance of collect() on large datasets
Spark Engineer
Senior Apache Spark engineer specializing in high-performance distributed data processing, optimizing large-scale ETL pipelines, and building production-grade Spark applications.
Core Workflow
- Analyze requirements - Understand data volume, transformations, latency requirements, cluster resources
- Design pipeline - Choose DataFrame vs RDD, plan partitioning strategy, identify broadcast opportunities
- Implement - Write Spark code with optimized transformations, appropriate caching, proper error handling
- Optimize - Analyze Spark UI, tune shuffle partitions, eliminate skew, optimize joins and aggregations
- Validate - Check Spark UI for shuffle spill before proceeding; verify partition count with
df.rdd.getNumPartitions(); if spill or skew detected, return to step 4; test with production-scale data, monitor resource usage, verify performance targets
Reference Guide
Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| Spark SQL & DataFrames | references/spark-sql-dataframes.md |
DataFrame API, Spark SQL, schemas, joins, aggregations |
More from jeffallan/claude-skills
laravel-specialist
Build and configure Laravel 10+ applications, including creating Eloquent models and relationships, implementing Sanctum authentication, configuring Horizon queues, designing RESTful APIs with API resources, and building reactive interfaces with Livewire. Use when creating Laravel models, setting up queue workers, implementing Sanctum auth flows, building Livewire components, optimising Eloquent queries, or writing Pest/PHPUnit tests for Laravel features.
13.0Kgolang-pro
Implements concurrent Go patterns using goroutines and channels, designs and builds microservices with gRPC or REST, optimizes Go application performance with pprof, and enforces idiomatic Go with generics, interfaces, and robust error handling. Use when building Go applications requiring concurrent programming, microservices architecture, or high-performance systems. Invoke for goroutines, channels, Go generics, gRPC integration, CLI tools, benchmarks, or table-driven testing.
12.1Kflutter-expert
Use when building cross-platform applications with Flutter 3+ and Dart. Invoke for widget development, Riverpod/Bloc state management, GoRouter navigation, platform-specific implementations, performance optimization.
10.6Kkubernetes-specialist
Use when deploying or managing Kubernetes workloads. Invoke to create deployment manifests, configure pod security policies, set up service accounts, define network isolation rules, debug pod crashes, analyze resource limits, inspect container logs, or right-size workloads. Use for Helm charts, RBAC policies, NetworkPolicies, storage configuration, performance optimization, GitOps pipelines, and multi-cluster management.
9.1Kphp-pro
Use when building PHP applications with modern PHP 8.3+ features, Laravel, or Symfony frameworks. Invokes strict typing, PHPStan level 9, async patterns with Swoole, and PSR standards. Creates controllers, configures middleware, generates migrations, writes PHPUnit/Pest tests, defines typed DTOs and value objects, sets up dependency injection, and scaffolds REST/GraphQL APIs. Use when working with Eloquent, Doctrine, Composer, Psalm, ReactPHP, or any PHP API development.
8.9Kspring-boot-engineer
Generates Spring Boot 3.x configurations, creates REST controllers, implements Spring Security 6 authentication flows, sets up Spring Data JPA repositories, and configures reactive WebFlux endpoints. Use when building Spring Boot 3.x applications, microservices, or reactive Java applications; invoke for Spring Data JPA, Spring Security 6, WebFlux, Spring Cloud integration, Java REST API design, or Microservices Java architecture.
5.6K