pdf-2

Purpose

This skill enables advanced PDF processing, including OCR for text extraction from images, form data extraction, table parsing into structured formats, digital signature verification and addition, PDF merging/splitting, and annotation handling. It's designed for automating document workflows in OpenClaw.

When to Use

Use this skill for tasks involving scanned or complex PDFs, such as extracting data from invoices with tables and forms, verifying legal documents with signatures, or preparing reports by merging files. Apply it when dealing with non-searchable PDFs (e.g., via OCR) or when integration with other tools is needed for document pipelines.

Key Capabilities

OCR: Uses Tesseract engine to extract text; supports languages via --lang flag (e.g., --lang eng for English).
Form Extraction: Parses PDF forms into JSON; extracts fields like text boxes or checkboxes using --fields flag.
Table Parsing: Detects and converts tables to CSV; specify layout with --layout auto or --layout grid.
Digital Signatures: Verifies signatures with --verify flag; adds new ones using --sign with a certificate path.
Merge/Split: Merges multiple PDFs via --inputs flag; splits by page range, e.g., --pages 1-5.
Annotation: Extracts or adds annotations (e.g., highlights) using --extract-annotations or --add-annotation type=highlight text="Note".

Usage Patterns

Invoke via CLI for one-off tasks or API for scripted workflows. Chain commands with pipes, e.g., OCR output to a text processor. For batch processing, use loops in scripts. Always specify input/output paths explicitly. If handling large files, set --timeout 300 for extended operations. Configure defaults in a JSON config file, e.g., {"default_lang": "eng"}.