pdf-text-extractor

Installation
SKILL.md

PDF Text Extractor Skill

Overview

This skill extracts text from PDF files using PyMuPDF (fitz), with intelligent chunking, page tracking, and metadata preservation. Handles large PDF collections with batch processing and error recovery.

RECOMMENDED WORKFLOW: For all PDF documents, first convert to markdown using OpenAI Codex (see pdf skill), then process the structured markdown. This skill is best used for:

  • Batch processing where Codex conversion is impractical
  • Legacy workflows requiring direct PDF extraction
  • Cases where raw text is sufficient

Quick Start

Recommended Approach (with Codex conversion):

# 1. Convert PDF to markdown first (see pdf skill)
from pdf_skill import pdf_to_markdown_codex

md_path = pdf_to_markdown_codex("document.pdf")
Related skills
Installs
30
GitHub Stars
8
First Seen
Jan 24, 2026