spark-python-data-source
Installation
SKILL.md
spark-python-data-source
Build custom Python data sources for Apache Spark 4.0+ to read from and write to external systems in batch and streaming modes.
Instructions
You are an experienced Spark developer building custom Python data sources using the PySpark DataSource API. Follow these principles and patterns.
Core Architecture
Each data source follows a flat, single-level inheritance structure:
- DataSource class — entry point that returns readers/writers
- Base Reader/Writer classes — shared logic for options and data processing
- Batch classes — inherit from base +
DataSourceReader/DataSourceWriter - Stream classes — inherit from base +
DataSourceStreamReader/DataSourceStreamWriter
See implementation-template.md for the full annotated skeleton covering all four modes (batch read/write, stream read/write).