document-rag-pipeline
Document RAG Pipeline Skill
Overview
This skill creates a complete Retrieval-Augmented Generation (RAG) system from a folder of documents. It handles:
- Regular PDF text extraction
- OCR for scanned/image-based PDFs
- DRM-protected file detection
- Text chunking with overlap
- Vector embedding generation
- SQLite storage with full-text search
- Semantic similarity search
Quick Start
# Install dependencies
pip install PyMuPDF pytesseract Pillow sentence-transformers numpy tqdm
More from vamseeachanta/workspace-hub
echarts
Create powerful interactive charts with Apache ECharts - balanced ease-of-use
139gis
Cross-application GIS skill — CRS reference, data formats, Blender/QGIS integration via digitalmodel.gis
80pandoc
Universal document converter for transforming Markdown to PDF, DOCX, HTML, LaTeX, and 40+ other formats. Covers templates, filters, citations with BibTeX/CSL, and batch conversion automation scripts.
74mkdocs
Build professional project documentation with MkDocs and Material theme.
73cli-productivity
Essential CLI tools and shell productivity patterns for efficient terminal workflows
55python-docx
Create and manipulate Microsoft Word documents programmatically. Build reports, contracts, and documentation with full control over paragraphs, tables, headers, styles, and images.
50