Literature PDF OCR Library

Overview

Use this skill to build a real, traceable literature corpus instead of fabricating references or scraping arbitrary publisher pages. The default workflow is: narrow the topic, search official or stable APIs, download only legally accessible PDFs, run OCR or layout parsing, then emit a clean Markdown library with machine-readable metadata.

Canonical Directory Layout

In Oh My Paper projects, the corpus always lives under .pipeline/literature/<corpus-name>/.
In standalone projects, use research/literature/<corpus-name>/.
Never dump papers into the root or a flat directory without a corpus name.

.pipeline/
  literature/
    <corpus-name>/              ← one folder per topic/session, e.g. "humanoid-locomotion"
      search_results.json       ← raw search/ID-lookup results

Related skills

More from ligphidonk/oh-my--paper

Installs

Repository

ligphidonk/oh-my--paper

GitHub Stars

455

First Seen

Apr 19, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykWarn

literature-pdf-ocr-library

Literature PDF OCR Library

Overview

Canonical Directory Layout

More from ligphidonk/oh-my--paper

biorxiv-database

inno-paper-reviewer

ml-paper-writing

paper-analyzer

research-news

inno-figure-gen