pdf-extraction
Extract text, tables, and metadata from PDF documents with character-level precision.
- Supports text extraction with layout preservation, word-level positioning, and character-level access including font and size metadata
- Includes advanced table detection with customizable strategies (lines, text, explicit) and tolerance tuning for complex layouts
- Provides visual debugging via image rendering with overlays for characters, words, lines, and detected table boundaries
- Handles cropping, region filtering, and font-based text selection for targeted data extraction from specific PDF areas
PDF Extraction Skill
Overview
This skill enables precise extraction of text, tables, and metadata from PDF documents using pdfplumber - the go-to library for PDF data extraction. Unlike basic PDF readers, pdfplumber provides detailed character-level positioning, accurate table detection, and visual debugging.
How to Use
- Provide the PDF file you want to extract from
- Specify what you need: text, tables, images, or metadata
- I'll generate pdfplumber code and execute it
Example prompts:
- "Extract all tables from this financial report"
- "Get text from pages 5-10 of this document"
- "Find and extract the invoice total from this PDF"
- "Convert this PDF table to CSV/Excel"
Domain Knowledge
More from claude-office-skills/skills
excel-automation
>
6.4Kppt-visual
Design presentation visuals and slide layouts. Create visual concepts, suggest graphics, and provide design specifications for impactful PowerPoint slides.
4.9Ksmart-ocr
>
2.4Kstock-analysis
Analyze stocks with fundamental and technical analysis. Supports US, China A-shares, and Hong Kong markets. Generate investment reports with key metrics.
2.2Ktiktok-marketing
TikTok content strategy, video creation workflows, posting optimization, and analytics. Based on n8n automation templates.
2.2Kcalendar-automation
Google Calendar and Outlook automation - scheduling optimization, meeting workflows, time blocking, and Slack/Sheets integration
2.1K