PDF Brain — Research → Practical System Moves

Use this skill when the user wants evidence-backed synthesis from the docs library (600+ books, PDFs, long-form references), not generic web summarization.

Pipeline v2 (ADR-0234)

The docs pipeline uses a staged artifact chain:

Extraction: opendataloader-pdf → structured markdown with headings, tables, reading order
Chunking: markdown-native heading detection, no overlap, hierarchical section + snippet chunks
Embeddings: nomic-embed-text via ollama GPU (768-dim, retrieval-tuned, pre-computed at ingest) in docs_chunks_v2 collection
Artifacts: durable on NAS at /Volumes/three-body/docs-artifacts/{docId}/ — .md, .meta.json, .chunks.jsonl
Summaries: LLM-generated per-document summaries in .meta.json

pdf-brain

PDF Brain — Research → Practical System Moves

Pipeline v2 (ADR-0234)

When to Use