docx-to-md
DOCX → Markdown
DOCX is structured XML, so text/tables can be extracted losslessly without OCR. But embedded images (architecture diagrams, flowcharts, screenshots) carry information that text-only extractors silently drop. This skill extracts them to disk and references them via standard Markdown image syntax at their original position — you describe them inline using your built-in Vision capability.
Workflow (agent mode — default, zero config)
Step 1 — Run the extractor
python "${CLAUDE_SKILL_DIR}/scripts/docx_to_md.py" \
--input <docx_or_dir> \
--output <output_dir>
Output:
<output_dir>/<stem>.md— headings, paragraphs, tables in document order, plusplaceholders at each large image's original position<output_dir>/<stem>/imgs/— extracted image files
Step 2 — Fill in image descriptions
More from ocozyo/docs-to-wiki
pdf-to-md
Convert PDF files to structured Markdown. Auto-detects native-text PDFs (extracted instantly with pymupdf, zero API cost) versus scanned/image PDFs (routed through PaddleOCR). Embedded large images are extracted to disk and referenced via standard Markdown image syntax — you (the agent) then describe them using your built-in Vision capability via the Read tool. No separate API key required. Use this skill whenever the user wants to convert a PDF to Markdown, extract text from PDFs, OCR scanned documents, or turn PDF reports into notes — even if they say "PDF → md", "extract this PDF", or "OCR these scans".
1docs-to-wiki
>
1pptx-to-md
Convert PPTX/PPSX presentations to structured Markdown by rendering each slide as a PNG. Preserves flowcharts, architecture diagrams, side-by-side comparisons, and visual layouts that shape-text extraction (markitdown, pandoc) silently drops. Slides are rendered to disk and referenced via standard Markdown image syntax — you (the agent) describe them using your built-in Vision capability via the Read tool. No separate API key required. Use this skill whenever the user wants to convert slides to Markdown, extract content from a presentation, or process decks into notes — even if they say "PPT → md", "extract these slides", or "turn this deck into a doc".
1