databricks-parsing
Installation
SKILL.md
Databricks Document Parsing
Parse unstructured documents into structured text using ai_parse_document — the foundation for document processing and custom RAG pipelines on Databricks.
When to Use
Use this skill when:
- Parsing PDFs, DOCX, PPTX, or images into text
- Extracting structured data from unstructured documents
- Building a custom RAG pipeline (parse → chunk → index → query)
- Ingesting documents from Unity Catalog Volumes for search or analysis
Overview
ai_parse_document is a SQL AI function that extracts content from binary documents. It runs on serverless SQL warehouses and supports PDF, DOC/DOCX, PPT/PPTX, JPG/JPEG, and PNG.