docx-toolkit

SKILL.md

DOCX Toolkit

A complete toolkit for processing Microsoft Word documents (.docx and legacy .doc formats).

Capabilities

1. Text + Table Extraction (.docx)

python3 {baseDir}/scripts/extract_text.py input.docx output.txt

Extracts all paragraphs and tables with structure preserved. Tables are formatted as pipe-delimited rows for easy parsing.

2. Text Extraction (Legacy .doc)

python3 {baseDir}/scripts/extract_doc_text.py input.doc output.txt

Handles legacy OLE2 .doc format using olefile. Extracts Unicode text from the WordDocument stream.

3. Image Extraction (.docx)

Installs
60
First Seen
Mar 13, 2026
Security Audits