docx-processing-openai
Installation
SKILL.md
DOCX reading, creation, and review guidance
Reading DOCXs
- Use
soffice -env:UserInstallation=file:///tmp/lo_profile_$$ --headless --convert-to pdf --outdir $OUTDIR $INPUT_DOCXto convert DOCXs to PDFs.- The
-env:UserInstallation=file:///tmp/lo_profile_$$flag is important. Otherwise, it will time out.
- The
- Then Convert the PDF to page images so you can visually inspect the result:
pdftoppm -png $OUTDIR/$BASENAME.pdf $OUTDIR/$BASENAME
- Then open the PNGs and read the images.
- Only do python printing as a last resort because you will miss important details with text extraction (e.g. figures, tables, diagrams).
Primary tooling for creating DOCXs
- Create and edit DOCX files with
python-docx. Use it to control structure, styles, tables, and lists. Install it withpip install python-docxif it's not already installed. - After every meaningful batch of edits—new sections, layout tweaks, styling changes—render the DOCX to PDF:
soffice -env:UserInstallation=file:///tmp/lo_profile_$$ --headless --convert-to pdf --outdir $OUTDIR $INPUT_DOCX
- Convert the PDF to page images so you can visually inspect the result:
pdftoppm -png $OUTDIR/$BASENAME.pdf $OUTDIR/$BASENAME
- Inspect every PNG before moving on. If you see any defect, fix the DOCX and repeat the render → inspect loop until all pages look perfect.