local-ocr

Installation
SKILL.md

Local OCR Pipeline Skill

Robust Optical Character Recognition (OCR) pipeline driven by ocrmypdf and tesseract. Handles scanned PDFs, rotated image inputs, and raw text extraction securely and locally without external APIs.

Why not GPU via PyTorch/EasyOCR? The ocrmypdf tool is the industry standard for producing searchable PDFs. It leverages tesseract for pixel-accurate text placement. A pure-CPU pipeline is leaner (avoids a 1.5GB PyTorch payload) and reliably embeds text exactly where it appears in the scanned image.

Capabilities

  1. Searchable PDF Generation: Converts rasterized/scanned PDFs or raw images (.jpg, .png, etc.) into PDFs with a selectable, searchable text layer.
  2. Auto-Rotation & Deskew: Automatically detects incorrectly rotated text and straightens crooked scans.
  3. Idempotent In-Place Processing: Safely processes files in-place using --skip-text, preventing double-processing of a PDF that already has embedded text.
  4. Structured JSON Output: All commands output structured JSON, making failure states (like missing dependencies) parseable by agents.
  5. Raw Text Extraction: Raw string extraction fallback for when agents need text directly in-memory instead of a PDF file.

Setup

# Installs system dependencies (tesseract, ocrmypdf, ghostscript) and sets up isolated venv
Related skills

More from baphomet480/claude-skills

Installs
23
First Seen
Feb 28, 2026