data-extractor

Installation
SKILL.md

Data Extractor

Overview

Extract structured data from documents in any format: PDF, DOCX, HTML, TXT, images, and more. Converts unstructured or semi-structured content into clean JSON, CSV, or other structured formats. Handles invoices, forms, reports, and free-text documents.

Instructions

When a user asks you to extract data from a document, follow this process:

Step 1: Identify the document format and install dependencies

# Determine file type
file document.pdf

# Install dependencies based on format
pip install pdfplumber python-docx beautifulsoup4 lxml openpyxl
Related skills
Installs
1
GitHub Stars
47
First Seen
Mar 13, 2026