pdf-harvester
PDF Harvester Skill
Extract and ingest PDF documents into RAG with proper text extraction, table handling, and metadata.
Overview
PDFs are common for research papers, reports, manuals, and ebooks. This skill covers:
- Text extraction with layout preservation
- Table extraction and conversion to markdown
- Academic paper patterns (abstract, sections, citations)
- OCR for scanned documents
- Multi-page chunking strategies
Prerequisites
# Core extraction
pip install pdfplumber pymupdf
More from mindmorass/reflex
site-crawler
Crawl and extract content from websites
303ffmpeg-patterns
FFmpeg video and audio processing patterns. Use when transcoding video/audio, extracting clips, adding filters, merging media, creating thumbnails, or batch processing media files.
235ai-video-generation
AI video generation patterns using Sora, Runway, Pika, and other AI video tools. Use when generating videos from text prompts, image-to-video conversion, AI video editing, or integrating AI video APIs.
114n8n-patterns
Design and implement n8n workflow automations with best practices
74graphviz-diagrams
Create complex graph visualizations using Graphviz DOT language, with both source code and pre-rendered images.
37podcast-production
Podcast production patterns and workflows. Use when recording podcasts, editing audio, transcribing episodes, generating show notes, RSS feed management, or podcast distribution.
26