doc-intelligence-promotion
Installation
SKILL.md
Document Intelligence Promotion
Single-pass extraction + multi-stage post-processing pipeline.
Note: This pipeline uses pdfplumber for single-document extraction (not batch). For batch text extraction across the corpus, use pdftotext via subprocess — see
pdf/pdftotext-popplersub-skill.
Architecture
PDF/DOCX → parser (single read) → manifest.yaml
↓
deep_extract.py (post-processors):
├── table_exporter.py → CSV files
├── worked_example_parser.py → pytest files
└── chart_extractor.py → images + metadata YAML