pdf-processing

Installation
SKILL.md

PDF Processing

Overview

Generate, manipulate, and extract data from PDF documents. This skill covers the Python PDF ecosystem: pypdf for merging/splitting/metadata, pdfplumber for text and table extraction, reportlab for generation, pytesseract for OCR, and strategies for form filling, watermarking, and complex document assembly.

Apply this skill whenever PDFs need to be created, parsed, transformed, or combined through code.

Multi-Phase Process

Phase 1: Requirements

  1. Determine operation type (generate, extract, manipulate)
  2. Identify input PDF characteristics (scanned, digital, forms)
  3. Define output requirements (format, quality, size)
  4. Plan data pipeline (source data to PDF or PDF to data)
  5. Assess volume and performance requirements

STOP — Do NOT select a library until the operation type and input characteristics are clear.

Related skills
Installs
32
GitHub Stars
1
First Seen
Apr 2, 2026