addon-pdf-preprocess-page-artifacts
Add-on: PDF Preprocessing (Page Artifacts)
Use this skill to implement the preprocessing stage that turns an uploaded PDF into:
- page-level text/markdown artifacts (raw)
- page-level cleaned artifacts (after header/footer cleanup; separate stage)
- stable provenance metadata for audit + reprocessing
Inputs
Collect:
PDF_PARSER:docling(if available) orpypdf/pdfplumberfallback.PAGE_MARKER_STYLE:structured-pages(preferred) ormarkdown-markers.
Output Contracts
For each page (1-based), persist:
raw_page_markdown(or raw extracted text)metadata_jsonbincluding parser name/version and extraction params
More from ajrlewis/ai-skills
architect-python-uv-fastapi-sqlalchemy
Use when scaffolding production-ready FastAPI services with uv, SQLAlchemy, Alembic, Postgres, Docker, and CI gates.
11addon-rag-ingestion-pipeline
Use when adding multi-format RAG ingest, chunk, embed, and retrieval pipelines; pair with architect-python-uv-batch or architect-python-uv-fastapi-sqlalchemy.
11addon-docling-legal-chunk-embed
Use when you need legal PDF to markdown extraction plus clause chunking and embedding prep; pair with addon-rag-ingestion-pipeline and architect-python-uv-batch.
10addon-llm-ancient-greek-translation
Use when adding Koine or Attic Greek translation to Next.js content flows; pair with ui-editorial-writing-surface and addon-nostr-nip23-longform.
10architect-python-uv-batch
Use when scaffolding production-ready Python uv batch or worker projects with Docker required by default.
10addon-human-pr-review-gate
Use when agent-generated code must pass a human PR review gate with trusted checks and merge blocks; pair with addon-decision-justification-ledger and architect-stack-selector.
9