pdf-to-docx
SKILL.md
PDF to Word Converter
Convert PDF pages to editable Word documents while preserving layout structure.
Workflow
- Extract PDF page as image - Use pdfplumber to render page at high resolution
- Run OCR - Use tesseract to extract text from the image
- Create Word document - Use python-docx to create document with matching layout
- Verify result - Compare generated document with original PDF
Quick Start
Extract a single page:
python scripts/extract_pdf_page.py /path/to/document.pdf 1 -o /output/dir