PDF Processing Guide
Overview
This guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.
CRITICAL: Smart PDF Reading — Avoid Context Overflow
Claude's Read tool converts each PDF page into an image. The API has a hard limit of 100 images per conversation. A 90+ page PDF will fail outright, and even smaller PDFs can consume enormous context budget (each page-image costs far more tokens than equivalent plain text).
This is the #1 cause of failures when processing PDFs. Always think before you read.
Step 0: Probe First, Read Later
For any PDF the user uploads or asks you to read, run the probe script first to understand what you're dealing with:
More from touricks/fanshi_personal_skills
study-notes-generator
Transform documents into interactive HTML study notes with key concepts, problem-solution pairs, examples, and self-quiz questions. Use when asked to create study notes, summarize lecture slides, or generate learning materials from documents in slides/ folder. Output goes to notes/ folder.
9scientific-slides
Build slide decks and presentations for research talks. Use this for making PowerPoint slides, conference presentations, seminar talks, research presentations, thesis defense slides, or any scientific talk. Provides slide structure, design templates, timing guidance, and visual validation. Works with PowerPoint and LaTeX Beamer.
7langgraph
Expert in LangGraph - the production-grade framework for building stateful, multi-actor AI applications. Covers graph construction, state management, cycles and branches, persistence with checkpointers, human-in-the-loop patterns, and the ReAct agent pattern. Used in production at LinkedIn, Uber, and 400+ companies. This is LangChain's recommended approach for building agents. Use when: langgraph, langchain agent, stateful agent, agent graph, react agent.
6humanizer-zh
去除文本中的 AI 生成痕迹。适用于编辑或审阅文本,使其听起来更自然、更像人类书写。基于维基百科的\"AI 写作特征\"综合指南。检测并修复以下模式:夸大的象征意义、宣传性语言、以 -ing 结尾的肤浅分析、模糊的归因、破折号过度使用、三段式法则、AI 词汇、否定式排比、过多的连接性短语。
6docx
Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for: (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks
6ml-paper-writing
Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready submissions. Includes LaTeX templates, reviewer guidelines, and citation verification workflows.
6