faker-data-generation

Installation
SKILL.md

Faker Data Generation Patterns

Overview

When generating synthetic data for Databricks Bronze layer tables, use Faker with configurable data corruption to test Silver layer data quality expectations.

Upstream: Synthetic Data Generation Workflow

The upstream databricks-synthetic-data-generation skill in AI-Dev-Kit introduces a file-based workflow:

File-Based Execution

  1. Write Python code to a local file (e.g., scripts/generate_data.py)
  2. Execute on Databricks using the run_python_file_on_databricks MCP tool
  3. If execution fails, edit the local file and re-execute

Context Reuse

The first execution auto-selects a running cluster and creates an execution context. Reuse cluster_id and context_id for follow-up calls (faster: ~1s vs ~15s).

Raw Data Only

By default, generate raw transactional data only — no total_x, sum_x, avg_x fields. SDP pipelines compute aggregations downstream.

Installs
1
GitHub Stars
2
First Seen
Mar 8, 2026
faker-data-generation — databricks-solutions/vibe-coding-workshop-template