docx-to-md

Installation
SKILL.md

DOCX → Markdown

DOCX is structured XML, so text/tables can be extracted losslessly without OCR. But embedded images (architecture diagrams, flowcharts, screenshots) carry information that text-only extractors silently drop. This skill extracts them to disk and references them via standard Markdown image syntax at their original position — you describe them inline using your built-in Vision capability.

Workflow (agent mode — default, zero config)

Step 1 — Run the extractor

python "${CLAUDE_SKILL_DIR}/scripts/docx_to_md.py" \
  --input <docx_or_dir> \
  --output <output_dir>

Output:

  • <output_dir>/<stem>.md — headings, paragraphs, tables in document order, plus ![](<stem>/imgs/img_NNN.png) placeholders at each large image's original position
  • <output_dir>/<stem>/imgs/ — extracted image files

Step 2 — Fill in image descriptions

Related skills
Installs
1
First Seen
9 days ago