docx-toolkit
SKILL.md
DOCX Toolkit
A complete toolkit for processing Microsoft Word documents (.docx and legacy .doc formats).
Capabilities
1. Text + Table Extraction (.docx)
python3 {baseDir}/scripts/extract_text.py input.docx output.txt
Extracts all paragraphs and tables with structure preserved. Tables are formatted as pipe-delimited rows for easy parsing.
2. Text Extraction (Legacy .doc)
python3 {baseDir}/scripts/extract_doc_text.py input.doc output.txt
Handles legacy OLE2 .doc format using olefile. Extracts Unicode text from the WordDocument stream.