pdf-brain

Installation
SKILL.md

PDF Brain — Research → Practical System Moves

Use this skill when the user wants evidence-backed synthesis from the docs library (600+ books, PDFs, long-form references), not generic web summarization.

Pipeline v2 (ADR-0234)

The docs pipeline uses a staged artifact chain:

  • Extraction: opendataloader-pdf → structured markdown with headings, tables, reading order
  • Chunking: markdown-native heading detection, no overlap, hierarchical section + snippet chunks
  • Embeddings: nomic-embed-text via ollama GPU (768-dim, retrieval-tuned, pre-computed at ingest) in docs_chunks_v2 collection
  • Artifacts: durable on NAS at /Volumes/three-body/docs-artifacts/{docId}/.md, .meta.json, .chunks.jsonl
  • Summaries: LLM-generated per-document summaries in .meta.json

When to Use

Installs
5
GitHub Stars
57
First Seen
Mar 1, 2026
pdf-brain — joelhooks/joelclaw