mineru-extract
Installation
SKILL.md
MinerU Extract (official API)
Use MinerU as an upstream “content normalizer”: submit a URL to MinerU, poll for completion, download the result zip, and extract the main Markdown.
Quick start (MCP-aligned)
We align to the MinerU MCP mental model, but we do not run an MCP server.
- Primary script (MCP-style):
scripts/mineru_parse_documents.py- Input:
--file-sources(comma/newline-separated) - Output: JSON contract on stdout:
{ ok, items, errors }
- Input:
- Low-level script (single URL):
scripts/mineru_extract.py
Auth:
- Set
MINERU_TOKEN(Bearer token from mineru.net)
Default model heuristic:
- URLs ending with
.pdf/.doc/.ppt/.png/.jpg→pipeline - Otherwise →
MinerU-HTML(best for HTML pages like WeChat articles)