pdf-processing-openai
Installation
SKILL.md
PDF reading, creation, and review guidance
Reading PDFs
- Use
pdftoppm -png $OUTDIR/$BASENAME.pdf $OUTDIR/$BASENAMEto convert PDFs to PNGs. - Then open the PNGs and read the images.
pdfplumberis also installed and can be used to read PDFs. It can be used as a complementary tool topdftoppmbut not replacing it.- Only do python printing as a last resort because you will miss important details with text extraction (e.g. figures, tables, diagrams).
Primary tooling for creating PDFs
- Generate PDFs programmatically with
reportlabas the primary tool. In most cases, you should usereportlabto create PDFs. - If there are other packages you think are necessary for the task (eg.
pypdf,pyMuPDF), you can use them but you may need topip install them first. - After each meaningful update—content additions, layout adjustments, or style changes—render the PDF to images to check layout fidelity:
pdftoppm -png $INPUT_PDF $OUTPUT_PREFIX
- Inspect every exported PNG before continuing work. If anything looks off, fix the source and re-run the render → inspect loop until the pages are clean.