literature-pdf-ocr-library

Installation
SKILL.md

Literature PDF OCR Library

Overview

Use this skill to build a real, traceable literature corpus instead of fabricating references or scraping arbitrary publisher pages. The default workflow is: narrow the topic, search official or stable APIs, download only legally accessible PDFs, run OCR or layout parsing, then emit a clean Markdown library with machine-readable metadata.


Canonical Directory Layout

In Oh My Paper projects, the corpus always lives under .pipeline/literature/<corpus-name>/.
In standalone projects, use research/literature/<corpus-name>/.
Never dump papers into the root or a flat directory without a corpus name.

.pipeline/
  literature/
    <corpus-name>/              ← one folder per topic/session, e.g. "humanoid-locomotion"
      search_results.json       ← raw search/ID-lookup results
Related skills
Installs
1
GitHub Stars
455
First Seen
Apr 19, 2026