Local OCR Pipeline Skill

Robust Optical Character Recognition (OCR) pipeline driven by ocrmypdf and tesseract. Handles scanned PDFs, rotated image inputs, and raw text extraction securely and locally without external APIs.

Why not GPU via PyTorch/EasyOCR? The ocrmypdf tool is the industry standard for producing searchable PDFs. It leverages tesseract for pixel-accurate text placement. A pure-CPU pipeline is leaner (avoids a 1.5GB PyTorch payload) and reliably embeds text exactly where it appears in the scanned image.

Capabilities

Searchable PDF Generation: Converts rasterized/scanned PDFs or raw images (.jpg, .png, etc.) into PDFs with a selectable, searchable text layer.
Auto-Rotation & Deskew: Automatically detects incorrectly rotated text and straightens crooked scans.
Idempotent In-Place Processing: Safely processes files in-place using --skip-text, preventing double-processing of a PDF that already has embedded text.
Structured JSON Output: All commands output structured JSON, making failure states (like missing dependencies) parseable by agents.
Raw Text Extraction: Raw string extraction fallback for when agents need text directly in-memory instead of a PDF file.

local-ocr

Local OCR Pipeline Skill

Capabilities

Setup