image-ocr

Installation
Summary

Text extraction from images using six OCR engines with preprocessing, cloud APIs, and structured output.

  • Supports six OCR tools with decision tree: Tesseract and EasyOCR for local processing, PaddleOCR for CJK and tables, Google Vision and AWS Textract for cloud accuracy, Claude Vision for semantic understanding
  • Includes full preprocessing pipeline (grayscale, deskew, denoise, binarization, morphological cleanup) to maximize accuracy on real-world images
  • Provides Python and Node.js implementations with confidence filtering, bounding box extraction, and form/table parsing for invoices and structured documents
  • Covers PDF text extraction with OCR fallback for scanned pages, post-processing regex patterns, and batch processing templates
SKILL.md

Image OCR Expert

Expert in extracting, processing, and structuring text from images using OCR tools and techniques.

Description

This skill provides specialized knowledge for extracting text from images, including:

  • Tool and library selection by use case (Tesseract, EasyOCR, PaddleOCR, cloud APIs)
  • Image preprocessing to maximize OCR accuracy
  • Post-processing and structuring of extracted text
  • Handling handwriting, receipts, invoices, documents, screenshots
  • Multilingual OCR and special character support
  • Integration into Python/Node.js/cloud pipelines

Triggers: ocr, extract text from image, image to text, read text image, optical character recognition, tesseract, easyocr, paddleocr, textract, vision api, document extraction, screenshot text, invoice ocr, receipt ocr, handwriting recognition, image text extraction


Tool Selection Guide

Related skills
Installs
1.2K
GitHub Stars
1
First Seen
Mar 4, 2026