document-processing

Installation
Summary

Process, extract, and manipulate PDF, Excel, Word, and PowerPoint documents programmatically.

  • Supports four major office formats (PDF, XLSX, DOCX, PPTX) with format-specific tools: pypdf and pdfplumber for PDFs, openpyxl and pandas for Excel, python-docx for Word, python-pptx for PowerPoint
  • Core operations include text and table extraction, document merging and splitting, format conversion, and OCR for scanned PDFs
  • Excel-specific guidance emphasizes writing formulas rather than static values for dynamic calculations, plus financial modeling conventions (color-coded text and fills)
  • Word documents support tracked changes via XML editing for professional redlining; PowerPoint covers slide structure, speaker notes, and design principles for consistent layouts
SKILL.md

Document Processing Guide

Work with office documents: PDF, Excel, Word, and PowerPoint.


Format Overview

Format Extension Structure Best For
PDF .pdf Binary/text Reports, forms, archives
Excel .xlsx XML in ZIP Data, calculations, models
Word .docx XML in ZIP Text documents, contracts
PowerPoint .pptx XML in ZIP Presentations, slides

Key concept: XLSX, DOCX, and PPTX are all ZIP archives containing XML files. You can unzip them to access raw content.


Related skills
Installs
892
Repository
eyadsibai/ltk
GitHub Stars
4
First Seen
Jan 28, 2026