| name | parse-bank-statement-pdf |
| description | Parse bank statement PDF text into structured transaction data with account information and transactions in consistent JSON format. Works with any bank format. Use when you need to extract or parse transactions from PDF bank statements. |
Bank Statement PDF Parsing Skill
Purpose
Extract structured transaction data from bank statement PDFs. Parse any bank's statement format and return transactions in a consistent JSON structure.
Input Format
You will receive:
pdf_text: The full text extracted from a PDF bank statementfile_name: The PDF filename (may contain hints about the bank)
Expected Output
Return a JSON object with:
{
"account_info": {
"bank_name": "Chase",
"account_type": "credit_card",
"account_name": "Chase Credit Card",
"last_four": "1234",
"statement_date": "2024-12-15",
"period_start": "2024-11-16",
"period_end": "2024-12-15"
},
"transactions": [
{
"transaction_date": "2024-12-01",
"post_date": "2024-12-02",
"amount": -45.23,
"merchant_original": "WHOLE FOODS MARKET #12345",
"description": "GROCERY PURCHASE",
"transaction_type": "expense"
}
]
}
Field Definitions
account_info
- bank_name: Name of the bank (e.g., "Chase", "Bank of America", "Wells Fargo")
- account_type: One of: "credit_card", "checking", "savings"
- account_name: Friendly name like "Chase Credit Card (...1234)"
- last_four: Last 4 digits of account number
- statement_date: Statement closing/end date (YYYY-MM-DD)
- period_start: Statement period start date (YYYY-MM-DD)
- period_end: Statement period end date (YYYY-MM-DD)
transactions
Each transaction must have:
- transaction_date: Date of transaction (YYYY-MM-DD format, required)
- post_date: Posting date if different from transaction date (YYYY-MM-DD, optional)
- amount: Transaction amount as a float (required)
- Negative for expenses: -45.23 (money going out)
- Positive for income/credits: +100.00 (money coming in)
- Always use negative for purchases/fees, positive for payments/refunds
- merchant_original: Merchant name exactly as it appears on statement (required)
- description: Additional description if available (optional)
- transaction_type: One of (required):
"expense"- Regular purchases, bills"income"- Deposits, salary, refunds"payment"- Credit card payments"fee"- Bank fees, late fees"interest"- Interest charges"transfer"- Transfers between accounts
Amount Sign Convention (CRITICAL)
Use consistent negative/positive regardless of how the bank displays it:
- Expenses/Purchases: Always negative (e.g., grocery purchase = -45.23)
- Income/Deposits: Always positive (e.g., paycheck = +1000.00)
- Credit Card Payments: Always positive (e.g., payment made = +500.00)
- Fees: Always negative (e.g., late fee = -35.00)
- Interest Charges: Always negative (e.g., interest = -12.50)
- Refunds/Credits: Always positive (e.g., refund = +25.00)
Parsing Guidelines
1. Identify Bank and Account
Look for:
- Bank name in header/footer
- Account type keywords (Credit Card, Checking, Savings)
- Account numbers (use last 4 digits only)
- Statement dates
2. Find Transaction Section
Banks typically have sections like:
- "Transactions", "Activity", "Account Activity"
- "Purchases", "Payments and Credits"
- Transaction tables with columns
3. Extract Each Transaction
For each transaction line:
- Date (various formats: MM/DD, MM/DD/YYYY, DD-MMM-YY)
- Merchant/description
- Amount (handle various formats: $1,234.56, 1234.56, (1234.56) for negatives)
- Apply consistent sign convention
4. Determine Transaction Type
Use these heuristics:
- PAYMENT, THANK YOU, AUTOPAY: payment
- DEPOSIT, DIRECT DEPOSIT, ACH CREDIT: income
- FEE, CHARGE, PENALTY: fee
- INTEREST CHARGED: interest
- TRANSFER, XFER: transfer
- ATM WITHDRAWAL: expense
- Everything else: expense (if negative) or income (if positive)
5. Handle Edge Cases
- Missing dates: Use statement end date as fallback
- Subtotals/totals: Skip lines like "Total Purchases", "Balance Forward"
- Multiple pages: Parse all transactions across all pages
- Foreign transactions: Convert to base currency if exchange rate shown
- Pending transactions: Exclude if explicitly marked as pending
Quality Checks
Before returning:
- All required fields present for each transaction
- Dates are valid and in YYYY-MM-DD format
- Amounts are numbers (not strings)
- Signs are consistent (expenses negative, income positive)
- At least some transactions found (warn if zero)
- Transaction dates within statement period (or close to it)
Example Input/Output
Input PDF Text (excerpt):
CHASE CREDIT CARD STATEMENT
Account ending in 1234
Statement Period: 11/16/2024 - 12/15/2024
PURCHASES
12/01 WHOLE FOODS MKT #456 45.23
12/03 SHELL OIL 789012 38.50
12/05 AMAZON.COM*MARKETPLACE 124.99
PAYMENTS AND CREDITS
12/10 ONLINE PAYMENT - THANK YOU 500.00CR
Expected Output:
{
"account_info": {
"bank_name": "Chase",
"account_type": "credit_card",
"account_name": "Chase Credit Card (...1234)",
"last_four": "1234",
"statement_date": "2024-12-15",
"period_start": "2024-11-16",
"period_end": "2024-12-15"
},
"transactions": [
{
"transaction_date": "2024-12-01",
"amount": -45.23,
"merchant_original": "WHOLE FOODS MKT #456",
"transaction_type": "expense"
},
{
"transaction_date": "2024-12-03",
"amount": -38.50,
"merchant_original": "SHELL OIL 789012",
"transaction_type": "expense"
},
{
"transaction_date": "2024-12-05",
"amount": -124.99,
"merchant_original": "AMAZON.COM*MARKETPLACE",
"transaction_type": "expense"
},
{
"transaction_date": "2024-12-10",
"amount": 500.00,
"merchant_original": "ONLINE PAYMENT - THANK YOU",
"transaction_type": "payment"
}
]
}
Important Notes
- Be thorough - Extract ALL transactions, not just a sample
- Preserve original merchant names - Don't clean them yet
- Use consistent date format - Always YYYY-MM-DD
- Handle year rollovers - If statement is Dec 2024 but transaction shows Jan, it's Jan 2025
- Return valid JSON - No markdown, just pure JSON
- If uncertain about amount sign - Follow the convention: expenses negative, income positive
- If parsing fails - Return partial results with what you could parse, include error message