pdf-harvester

Installation
SKILL.md

PDF Harvester Skill

Extract and ingest PDF documents into RAG with proper text extraction, table handling, and metadata.

Overview

PDFs are common for research papers, reports, manuals, and ebooks. This skill covers:

  • Text extraction with layout preservation
  • Table extraction and conversion to markdown
  • Academic paper patterns (abstract, sections, citations)
  • OCR for scanned documents
  • Multi-page chunking strategies

Prerequisites

# Core extraction
pip install pdfplumber pymupdf
Related skills
Installs
46
GitHub Stars
2
First Seen
Jan 24, 2026