synthetic-data-generation

Installation
SKILL.md

Synthetic Data Generation

Generate realistic, story-driven synthetic data for Databricks using Python with Faker and Spark.

Common Libraries

These libraries are useful for generating realistic synthetic data:

  • faker: Generates realistic names, addresses, emails, companies, dates, etc.
  • holidays: Provides country-specific holiday calendars for realistic date patterns

These are typically NOT pre-installed on Databricks. Install them using execute_databricks_command tool:

  • code: "%pip install faker holidays"

Save the returned cluster_id and context_id for subsequent calls.

Workflow

  1. Write Python code to a local file in the project (e.g., scripts/generate_data.py)
Related skills

More from databricks-solutions/ai-dev-kit

Installs
5
GitHub Stars
1.4K
First Seen
Feb 16, 2026