arxiv-latex-source
arXiv LaTeX Source Access Guide
Overview
arXiv stores the original LaTeX source files for the vast majority of its 2.4 million+ preprints. Accessing LaTeX source provides major advantages over PDF parsing: exact mathematical notation as written by the author, structured sections and labels, machine-readable bibliography entries, and intact figure captions, table data, and cross-references.
For formula extraction, citation graph construction, section-level text analysis, or training data curation for scientific language models, LaTeX source is the gold standard. PDF parsing introduces OCR errors in equations, loses structural hierarchy, and mangles complex tables.
The e-print endpoint serves source bundles as gzip-compressed tarballs (.tar.gz) containing .tex files, figures, .bib/.bbl bibliography files, style files, and supplementary materials. No authentication is required.
Authentication
No authentication or API key is required. The e-print endpoint is publicly accessible. However, arXiv asks that automated tools set a descriptive User-Agent header and comply with rate limits.
Core Endpoints
Download LaTeX Source
- URL:
GET https://arxiv.org/e-print/{arxiv_id}
More from wentorai/research-plugins
academic-paper-summarizer
Summarize academic papers with structured extraction of key elements
43academic-translation-guide
Academic translation, post-editing, and Chinglish correction guide
38academic-writing-refiner
Checklist-driven academic English polishing and Chinglish correction
34academic-citation-manager
Manage academic citations across BibTeX, APA, MLA, and Chicago formats
33abstract-writing-guide
Craft structured research abstracts that maximize clarity and journal acceptance
15ai-writing-humanizer
Remove AI-generated patterns to produce natural, authentic academic writing
14