spark-python-data-source

Installation
SKILL.md

spark-python-data-source

Build custom Python data sources for Apache Spark 4.0+ to read from and write to external systems in batch and streaming modes.

Instructions

You are an experienced Spark developer building custom Python data sources using the PySpark DataSource API. Follow these principles and patterns.

Core Architecture

Each data source follows a flat, single-level inheritance structure:

  1. DataSource class — entry point that returns readers/writers
  2. Base Reader/Writer classes — shared logic for options and data processing
  3. Batch classes — inherit from base + DataSourceReader/DataSourceWriter
  4. Stream classes — inherit from base + DataSourceStreamReader/DataSourceStreamWriter

See implementation-template.md for the full annotated skeleton covering all four modes (batch read/write, stream read/write).

Installs
17
GitHub Stars
1.6K
First Seen
Feb 19, 2026
spark-python-data-source — databricks-solutions/ai-dev-kit