document-rag-pipeline

Installation
SKILL.md

Document RAG Pipeline Skill

Overview

This skill creates a complete Retrieval-Augmented Generation (RAG) system from a folder of documents. It handles:

  • Regular PDF text extraction
  • OCR for scanned/image-based PDFs
  • DRM-protected file detection
  • Text chunking with overlap
  • Vector embedding generation
  • SQLite storage with full-text search
  • Semantic similarity search

Quick Start

# Install dependencies
pip install PyMuPDF pytesseract Pillow sentence-transformers numpy tqdm
Related skills
Installs
24
GitHub Stars
8
First Seen
Jan 24, 2026