pdf-brain
Installation
SKILL.md
PDF Brain — Research → Practical System Moves
Use this skill when the user wants evidence-backed synthesis from the docs library (600+ books, PDFs, long-form references), not generic web summarization.
Pipeline v2 (ADR-0234)
The docs pipeline uses a staged artifact chain:
- Extraction: opendataloader-pdf → structured markdown with headings, tables, reading order
- Chunking: markdown-native heading detection, no overlap, hierarchical section + snippet chunks
- Embeddings: nomic-embed-text via ollama GPU (768-dim, retrieval-tuned, pre-computed at ingest) in
docs_chunks_v2collection - Artifacts: durable on NAS at
/Volumes/three-body/docs-artifacts/{docId}/—.md,.meta.json,.chunks.jsonl - Summaries: LLM-generated per-document summaries in
.meta.json