data-extractor
Installation
SKILL.md
Data Extractor
Overview
Extract structured data from documents in any format: PDF, DOCX, HTML, TXT, images, and more. Converts unstructured or semi-structured content into clean JSON, CSV, or other structured formats. Handles invoices, forms, reports, and free-text documents.
Instructions
When a user asks you to extract data from a document, follow this process:
Step 1: Identify the document format and install dependencies
# Determine file type
file document.pdf
# Install dependencies based on format
pip install pdfplumber python-docx beautifulsoup4 lxml openpyxl
Related skills