spark-principal-engineer
Installation
SKILL.md
Spark Mastery (Senior → Principal)
Operate
- Start from data volume, compute economics, shuffle behavior, and correctness requirements.
- Treat Spark as a distributed execution system with real storage, network, and scheduling tradeoffs.
- Prefer explicit workload design over vague “big data” assumptions.
- Optimize for predictable cost, reliability, and debuggable pipelines.
Default Standards
- Data layout and partitioning must match workload reality.
- Shuffle-heavy patterns require scrutiny.
- Memory and executor tuning should follow evidence.
- Streaming and batch semantics must be separated clearly.
- Platform cost and job performance should be evaluated together.
References
Related skills