name	pdf-processing
description	Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

PDF Processing Skill

Quick Start

Use pdfplumber to extract text from PDFs:

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    text = pdf.pages[0].extract_text()
    print(text)

Capabilities

1. Text Extraction

Extract all text from a PDF document:

def extract_all_text(pdf_path):
    with pdfplumber.open(pdf_path) as pdf:
        full_text = ""
        for page in pdf.pages:
            full_text += page.extract_text() or ""
        return full_text

2. Table Extraction

Extract tables from PDF pages. For detailed table extraction, see [[TABLES.md]].

Basic example:

with pdfplumber.open("document.pdf") as pdf:
    tables = pdf.pages[0].extract_tables()
    for table in tables:
        print(table)

3. Form Filling

Fill PDF forms programmatically. For comprehensive form-filling guide, see [[FORMS.md]].

Best Practices

Performance: For large PDFs, process page-by-page to avoid memory issues
OCR: For scanned PDFs without text layer, recommend using OCR tools first
Encoding: Handle UTF-8 encoding properly when extracting text

Common Use Cases

Invoice text extraction
Table data scraping from reports
PDF form automation
Document merging and splitting

pdf-processing

Install Skill

SKILL.md