databricks-parsing

Installation
SKILL.md

Databricks Document Parsing

Parse unstructured documents into structured text using ai_parse_document — the foundation for document processing and custom RAG pipelines on Databricks.

When to Use

Use this skill when:

  • Parsing PDFs, DOCX, PPTX, or images into text
  • Extracting structured data from unstructured documents
  • Building a custom RAG pipeline (parse → chunk → index → query)
  • Ingesting documents from Unity Catalog Volumes for search or analysis

Overview

ai_parse_document is a SQL AI function that extracts content from binary documents. It runs on serverless SQL warehouses and supports PDF, DOC/DOCX, PPT/PPTX, JPG/JPEG, and PNG.

Installs
3
GitHub Stars
1.6K
First Seen
Mar 5, 2026
databricks-parsing — databricks-solutions/ai-dev-kit