pdf-to-md

Installation

SKILL.md

PDF → Markdown

PDFs route by content type. The script does deterministic extraction; you (the agent) describe any embedded images using your built-in Vision capability.

Routing

Input	Path	Notes
Native-text PDF (avg >50 chars/page)	`pdf_to_md.py` (pymupdf)	Seconds, zero API cost
Scanned / image PDF	`ocr_extract.py` (PaddleOCR)	Auto-skipped from fast path; OCR vendor handles it

Workflow (agent mode — default, zero config)

Step 1 — Run the extractor

Related skills

More from ocozyo/docs-to-wiki

docx-to-md
Convert DOCX/Word documents to structured Markdown. Extracts headings, paragraphs, and tables losslessly via python-docx. Embedded large images are extracted to disk and referenced via standard Markdown image syntax at their original document position — you (the agent) then describe them using your built-in Vision capability via the Read tool. No separate API key required. Use this skill whenever the user wants to convert a Word document to Markdown, extract content from a .docx, or process Word reports into notes — even if they say "Word → md", "extract this docx", or "turn this report into Markdown".
1
docs-to-wiki
>
1
pptx-to-md
Convert PPTX/PPSX presentations to structured Markdown by rendering each slide as a PNG. Preserves flowcharts, architecture diagrams, side-by-side comparisons, and visual layouts that shape-text extraction (markitdown, pandoc) silently drops. Slides are rendered to disk and referenced via standard Markdown image syntax — you (the agent) describe them using your built-in Vision capability via the Read tool. No separate API key required. Use this skill whenever the user wants to convert slides to Markdown, extract content from a presentation, or process decks into notes — even if they say "PPT → md", "extract these slides", or "turn this deck into a doc".
1

Installs

1

Repository

ocozyo/docs-to-wiki

First Seen

9 days ago

Security Audits

Gen Agent Trust HubPass