paper-fetch

Installation
SKILL.md

paper-fetch

Fetch the PDF for a paper given a DOI (or title). Tries multiple sources in priority order and stops at the first hit.

Resolution order

  1. Unpaywallhttps://api.unpaywall.org/v2/{doi}?email=$UNPAYWALL_EMAIL, read best_oa_location.url_for_pdf (skipped if UNPAYWALL_EMAIL not set)
  2. Semantic Scholarhttps://api.semanticscholar.org/graph/v1/paper/DOI:{doi}?fields=openAccessPdf,externalIds
  3. arXiv — if externalIds.ArXiv present, https://arxiv.org/pdf/{arxiv_id}.pdf
  4. PubMed Central OA — if PMCID present, https://www.ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/
  5. bioRxiv / medRxiv — if DOI prefix is 10.1101, query https://api.biorxiv.org/details/{server}/{doi} for the latest version PDF URL
  6. Publisher direct (institutional mode only — PAPER_FETCH_INSTITUTIONAL=1) — DOI-prefix → publisher PDF template (Nature / Science / Wiley / Springer / ACS / PNAS / NEJM / Sage / T&F / Elsevier). The caller's own subscription IP / cookies / EZproxy are what authorize the fetch; unauthorized responses fail the %PDF check and fall through to step 7.
  7. Sci-Hub mirrors (on by default; disable with PAPER_FETCH_NO_SCIHUB=1) — last-resort fallback. Tries the mirror list in PAPER_FETCH_SCIHUB_MIRRORS (or built-in defaults sci-hub.ru, sci-hub.st, sci-hub.su, sci-hub.box, sci-hub.red, sci-hub.al, sci-hub.mk, sci-hub.ee) in order; on full miss, scrapes https://www.sci-hub.pub/ once per process for fresh mirrors. CAPTCHA / missing-paper pages have no PDF iframe and fall through silently.
  8. Otherwise → report failure with title/authors so the user can request via ILL

If only a title is given, pass it directly via --title "<title>". Resolution chain:

  1. Crossref query.title — primary; covers all major journal/conference DOIs
  2. Semantic Scholar /paper/search/match — fallback when Crossref's top match is low-confidence (match_score < 40) or the gap to the runner-up is < 3. Critically, S2 covers arXiv-only preprints (no Crossref DOI). When S2 surfaces a paper that has only an arXiv id, the canonical 10.48550/arXiv.<id> is synthesized so the download chain stays uniform.
Related skills

More from agents365-ai/365-skills

Installs
72
GitHub Stars
3
First Seen
3 days ago