addon-pdf-preprocess-page-artifacts

Installation
SKILL.md

Add-on: PDF Preprocessing (Page Artifacts)

Use this skill to implement the preprocessing stage that turns an uploaded PDF into:

  • page-level text/markdown artifacts (raw)
  • page-level cleaned artifacts (after header/footer cleanup; separate stage)
  • stable provenance metadata for audit + reprocessing

Inputs

Collect:

  • PDF_PARSER: docling (if available) or pypdf/pdfplumber fallback.
  • PAGE_MARKER_STYLE: structured-pages (preferred) or markdown-markers.

Output Contracts

For each page (1-based), persist:

  • raw_page_markdown (or raw extracted text)
  • metadata_jsonb including parser name/version and extraction params
Related skills
Installs
1
First Seen
Mar 12, 2026