arXiv Analyze

Fetch an arXiv paper in the most token-efficient format available, then analyze it.

Fallback chain

The fetcher script tries formats in order of quality and token efficiency:

arxiv2md (markdown) — cleanest, LaTeX math preserved, refs/TOC/citations stripped by default; rate-limited client-side to 28 req/min
arxiv.org/html — official HTML, can be auto-converted to markdown
ar5iv — arXiv Labs HTML renderer, broader coverage for older papers
arxiv.org/pdf — last resort; URL returned, load with your PDF reader of choice

Opt-in tier 5: TeX source (--tier tex) — downloads arxiv.org/src/<id> (the original submitted tarball), extracts it to the cache, auto-detects the main entrypoint (main.tex / \documentclass / largest .tex), and recursively flattens all \input/\include/\import/\subimport directives. This is the canonical ground truth: available for every paper (even those without HTML renderings), equations are pristine, no third-party dependency. Use it when the paper is math-heavy, when rendered tiers fail, or when you need bibliography entries. Token cost is higher than markdown — prefer auto-tier for most reads.

Disk cache

Fetched content is cached at $XDG_CACHE_HOME/ai-skill-arxiv/<arxiv_id>/ (defaults to ~/.cache/ai-skill-arxiv/<arxiv_id>/). Subsequent fetches of the same paper + tier are instant and offline-safe. Re-reads cost zero network and zero rate-limit budget.

arxiv-analyze

arXiv Analyze

Fallback chain

Disk cache

More from dsebastien/ai-skill-arxiv

arxiv-search

arxiv-monitor