name	pdf-extractor
description	Extract text, tables, and metadata from PDF files. Use when working with PDFs, document extraction, or parsing PDF content.
version	1.0.0
author	agentflow
triggers	pdf, extract text, parse document, read pdf
requirements	pypdf, pdfplumber
tags	document, extraction, pdf

PDF Extraction Instructions

Name: pdf-extractor
Author: liushuang393

Overview

This skill extracts text and data from PDF files using Python libraries.

Usage

Basic Text Extraction

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        text = page.extract_text()
        print(text)

Extract Tables

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            print(table)

Get Metadata

from pypdf import PdfReader

reader = PdfReader("document.pdf")
metadata = reader.metadata
print(f"Title: {metadata.title}")
print(f"Author: {metadata.author}")
print(f"Pages: {len(reader.pages)}")

Requirements

Install the required packages:

pip install pypdf pdfplumber

Notes

For scanned PDFs, consider using OCR libraries like pytesseract
Large PDFs should be processed page by page to manage memory

pdf-extractor

Install Skill

SKILL.md

PDF Extraction Instructions

Overview

Usage

Basic Text Extraction

Extract Tables

Get Metadata

Requirements

Notes