image-to-text
Image to Text
Extract all readable text from an image using OCR (Tesseract). Returns the full text content along with word-level bounding boxes and confidence scores.
When to Use
- Reading text content from a screenshot or design mockup
- Extracting UI copy (labels, buttons, headings) so you don't have to retype it
- Getting text positions and bounding boxes from a design image
How It Works
- The image is passed to Tesseract.js for optical character recognition
- Tesseract segments the image into lines and words
- Returns the full text plus word-level details (position, confidence)
Usage
More from pascalorg/skills
web-design
Web design reference for building production-grade interfaces. Covers layout, typography, color, spacing, shadows, animation, accessibility, responsive design, components, performance, and UX psychology. Use when building UI, reviewing design quality, choosing design tokens, or making any visual design decision.
59image-compare
Compare two images pixel-by-pixel and get a visual diff. Use when the user wants to compare their implementation against a design, spot differences between two screenshots, or verify visual regression.
50image-analysis
Extract color palettes from images (screenshots, Figma exports, design mockups) to help implement matching UI. Use when the user shares a screenshot, design image, or asks to "match these colors", "extract colors from this image", "implement this design", or "get the color palette".
46contrast-check
Check color contrast ratios against WCAG AA and AAA accessibility standards. Use when the user wants to verify if their color combinations are accessible, check contrast between text and background colors, or audit a palette for accessibility.
36agent-collaboration
Multi-model agent orchestration using specialized agents for planning, coding, research, math/science, visual analysis, and adversarial review. Use when tasks are complex enough to benefit from different models' strengths, when you want adversarial review to catch blind spots, or when coordinating multi-step workflows across agent roles. Triggers on complex projects, multi-step tasks, architecture decisions, or when explicitly requested.
13