pdf-extraction

Installation
Summary

Extract text, tables, and metadata from PDF documents with character-level precision.

  • Supports text extraction with layout preservation, word-level positioning, and character-level access including font and size metadata
  • Includes advanced table detection with customizable strategies (lines, text, explicit) and tolerance tuning for complex layouts
  • Provides visual debugging via image rendering with overlays for characters, words, lines, and detected table boundaries
  • Handles cropping, region filtering, and font-based text selection for targeted data extraction from specific PDF areas
SKILL.md

PDF Extraction Skill

Overview

This skill enables precise extraction of text, tables, and metadata from PDF documents using pdfplumber - the go-to library for PDF data extraction. Unlike basic PDF readers, pdfplumber provides detailed character-level positioning, accurate table detection, and visual debugging.

How to Use

  1. Provide the PDF file you want to extract from
  2. Specify what you need: text, tables, images, or metadata
  3. I'll generate pdfplumber code and execute it

Example prompts:

  • "Extract all tables from this financial report"
  • "Get text from pages 5-10 of this document"
  • "Find and extract the invoice total from this PDF"
  • "Convert this PDF table to CSV/Excel"

Domain Knowledge

Related skills
Installs
2.5K
GitHub Stars
111
First Seen
Mar 9, 2026