literature-pdf-ocr-library
Installation
SKILL.md
Literature PDF OCR Library
Overview
Use this skill to build a real, traceable literature corpus instead of fabricating references or scraping arbitrary publisher pages. The default workflow is: narrow the topic, search official or stable APIs, download only legally accessible PDFs, run OCR or layout parsing, then emit a clean Markdown library with machine-readable metadata.
Canonical Directory Layout
In Oh My Paper projects, the corpus always lives under .pipeline/literature/<corpus-name>/.
In standalone projects, use research/literature/<corpus-name>/.
Never dump papers into the root or a flat directory without a corpus name.
.pipeline/
literature/
<corpus-name>/ ← one folder per topic/session, e.g. "humanoid-locomotion"
search_results.json ← raw search/ID-lookup results
Related skills