PDF Text Extraction
PDF Text Extraction
Extract high-quality text from PDFs using two OCR engines:
asta pdf-extraction olmocr— cloud-based extraction via olmOCR (best for large batches, S3, and complex layouts)asta pdf-extraction remote— quick single-file extraction via the Asta remote OCR API
Installation
This skill requires the asta CLI:
# Install/reinstall at the correct version
PLUGIN_VERSION=0.16.0
if [ "$(asta --version 2>/dev/null | grep -oE '[0-9]+\.[0-9]+\.[0-9]+')" != "$PLUGIN_VERSION" ]; then
uv tool install --force git+https://github.com/allenai/asta-plugins.git@v$PLUGIN_VERSION
fi
More from allenai/asta-plugins
semantic scholar lookup
This skill should be used when the user asks to "get paper details", "look up a paper", "find citations", "who cited this paper", "papers by [author]", "search for papers on [topic]", or needs quick lookups of paper metadata, citations, or author information from Semantic Scholar. Use this for fast, targeted queries (not comprehensive reports).
50preview
Render and deploy project documents, reports, and notebooks. Use when docs need to be shared or when previewing how documents render with citations and formatting.
38asta literature reports
Create or update literature reviews/reports. Use whenever you need to research, summarize, or synthesize the literature.
35asta library
Local document metadata index for files used by Asta skills and tools. Use this skill when the user asks to store a document "in Asta" or retrieve "from Asta". Use it when the
34workspace
Set up a GitHub Codespaces or Dev Container environment with Asta skills installed in GitHub Copilot and Quarto pre-configured. Use when asked to set up a Codespace or devcontainer for an Asta project.
25asta literature search
This skill should be used when the user asks to "find papers", "search for papers", "what does the literature say", "find research on", "academic papers about", "literature review", "cite papers", or needs to answer questions using academic literature.
22