Claude Code Plugins

Community-maintained marketplace

Feedback

Detects Personally Identifiable Information (PII) in code, logs, databases, and files for GDPR/CCPA compliance. Use when user asks to "detect PII", "find sensitive data", "scan for personal information", "check GDPR compliance", or "find SSN/credit cards".

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name pii-detector
description Detects Personally Identifiable Information (PII) in code, logs, databases, and files for GDPR/CCPA compliance. Use when user asks to "detect PII", "find sensitive data", "scan for personal information", "check GDPR compliance", or "find SSN/credit cards".
allowed-tools Read, Write, Bash, Glob, Grep

PII Detector

Scans code, logs, databases, and configuration files for Personally Identifiable Information (PII) to ensure GDPR, CCPA, and privacy compliance.

When to Use

  • "Scan for PII in my codebase"
  • "Find sensitive data"
  • "Check for exposed personal information"
  • "Detect SSN, credit cards, emails"
  • "GDPR compliance check"
  • "Find PII in logs"

Instructions

1. Detect Project Type

# Check project structure
ls -la

# Detect language
[ -f "package.json" ] && echo "JavaScript/TypeScript"
[ -f "requirements.txt" ] && echo "Python"
[ -f "pom.xml" ] && echo "Java"
[ -f "Gemfile" ] && echo "Ruby"

# Check for logs
find . -name "*.log" -type f | head -5

2. Define PII Patterns

Common PII Types:

  1. Social Security Numbers (SSN)

    • Pattern: \b\d{3}-\d{2}-\d{4}\b
    • Example: 123-45-6789
  2. Credit Card Numbers

    • Visa: \b4\d{3}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b
    • MasterCard: \b5[1-5]\d{2}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b
    • Amex: \b3[47]\d{2}[\s-]?\d{6}[\s-]?\d{5}\b
    • Discover: \b6011[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b
  3. Email Addresses

    • Pattern: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b
  4. Phone Numbers

    • US: \b\d{3}[-.]?\d{3}[-.]?\d{4}\b
    • International: \+\d{1,3}[\s-]?\d{1,14}
  5. IP Addresses

    • IPv4: \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
    • IPv6: ([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}
  6. Dates of Birth

    • Pattern: \b\d{2}/\d{2}/\d{4}\b or \b\d{4}-\d{2}-\d{2}\b
  7. Passport Numbers

    • US: \b[A-Z]{1,2}\d{6,9}\b
  8. Driver's License

    • Varies by state/country
  9. Bank Account Numbers

    • Pattern: \b\d{8,17}\b
  10. API Keys / Tokens

    • AWS: AKIA[0-9A-Z]{16}
    • Slack: xox[baprs]-[0-9a-zA-Z-]{10,}
    • GitHub: ghp_[0-9a-zA-Z]{36}

3. Scan Codebase

Using grep:

# Scan for SSN
grep -rn '\b\d{3}-\d{2}-\d{4}\b' . --include="*.js" --include="*.py" --include="*.java"

# Scan for credit cards
grep -rn '\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b' . --exclude-dir=node_modules

# Scan for emails
grep -rn '\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' . --include="*.log"

# Scan for phone numbers
grep -rn '\b\d{3}[-.]?\d{3}[-.]?\d{4}\b' .

# Scan for API keys
grep -rn 'AKIA[0-9A-Z]{16}' . --include="*.env*" --include="*.config*"

Exclude common false positives:

# Exclude test files, build directories
grep -rn <pattern> . \
  --exclude-dir=node_modules \
  --exclude-dir=.git \
  --exclude-dir=dist \
  --exclude-dir=build \
  --exclude-dir=vendor \
  --exclude-dir=__pycache__ \
  --exclude="*.test.js" \
  --exclude="*.spec.ts" \
  --exclude="*.min.js"

4. Create PII Detection Script

Python Script:

#!/usr/bin/env python3
import re
import os
import sys
from pathlib import Path
from typing import List, Dict, Tuple

class PIIDetector:
    def __init__(self):
        self.patterns = {
            'SSN': r'\b\d{3}-\d{2}-\d{4}\b',
            'Credit Card': r'\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b',
            'Email': r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
            'Phone (US)': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b',
            'IPv4': r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b',
            'AWS Key': r'AKIA[0-9A-Z]{16}',
            'GitHub Token': r'ghp_[0-9a-zA-Z]{36}',
            'Slack Token': r'xox[baprs]-[0-9a-zA-Z-]{10,}',
            'Date of Birth': r'\b(?:0[1-9]|1[0-2])/(?:0[1-9]|[12][0-9]|3[01])/(?:19|20)\d{2}\b',
        }

        self.exclude_dirs = {
            'node_modules', '.git', 'dist', 'build', 'vendor',
            '__pycache__', '.next', 'out', 'coverage', '.venv'
        }

        self.exclude_extensions = {
            '.min.js', '.map', '.lock', '.jpg', '.png', '.gif',
            '.pdf', '.zip', '.tar', '.gz'
        }

    def should_scan_file(self, filepath: Path) -> bool:
        """Check if file should be scanned."""
        # Check excluded directories
        if any(excluded in filepath.parts for excluded in self.exclude_dirs):
            return False

        # Check excluded extensions
        if filepath.suffix in self.exclude_extensions:
            return False

        # Check file size (skip files > 10MB)
        try:
            if filepath.stat().st_size > 10 * 1024 * 1024:
                return False
        except OSError:
            return False

        return True

    def scan_file(self, filepath: Path) -> List[Dict]:
        """Scan a single file for PII."""
        findings = []

        try:
            with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
                for line_num, line in enumerate(f, 1):
                    for pii_type, pattern in self.patterns.items():
                        matches = re.finditer(pattern, line)
                        for match in matches:
                            # Check for common false positives
                            if self.is_false_positive(pii_type, match.group(), line):
                                continue

                            findings.append({
                                'file': str(filepath),
                                'line': line_num,
                                'type': pii_type,
                                'value': self.mask_pii(match.group()),
                                'context': line.strip()[:100]
                            })
        except Exception as e:
            print(f"Error scanning {filepath}: {e}", file=sys.stderr)

        return findings

    def is_false_positive(self, pii_type: str, value: str, context: str) -> bool:
        """Check for common false positives."""
        # Common test data
        test_patterns = [
            '000-00-0000',
            '111-11-1111',
            '123-45-6789',
            '4111111111111111',  # Test credit card
            'test@example.com',
            'user@localhost',
            '127.0.0.1',
            '0.0.0.0',
            '192.168.',
        ]

        for pattern in test_patterns:
            if pattern in value:
                return True

        # Check if in comment
        if any(comment in context for comment in ['//', '#', '/*', '*', '<!--']):
            if 'example' in context.lower() or 'test' in context.lower():
                return True

        return False

    def mask_pii(self, value: str) -> str:
        """Mask PII value for reporting."""
        if len(value) <= 4:
            return '*' * len(value)
        return value[:2] + '*' * (len(value) - 4) + value[-2:]

    def scan_directory(self, directory: Path) -> List[Dict]:
        """Recursively scan directory for PII."""
        all_findings = []

        for filepath in directory.rglob('*'):
            if filepath.is_file() and self.should_scan_file(filepath):
                findings = self.scan_file(filepath)
                all_findings.extend(findings)

        return all_findings

    def generate_report(self, findings: List[Dict]) -> str:
        """Generate human-readable report."""
        if not findings:
            return "✅ No PII detected!"

        report = f"⚠️  Found {len(findings)} potential PII instances:\n\n"

        # Group by type
        by_type = {}
        for finding in findings:
            pii_type = finding['type']
            if pii_type not in by_type:
                by_type[pii_type] = []
            by_type[pii_type].append(finding)

        for pii_type, items in sorted(by_type.items()):
            report += f"## {pii_type} ({len(items)} found)\n\n"
            for item in items[:10]:  # Limit to 10 per type
                report += f"- {item['file']}:{item['line']}\n"
                report += f"  Value: {item['value']}\n"
                report += f"  Context: {item['context']}\n\n"

            if len(items) > 10:
                report += f"  ... and {len(items) - 10} more\n\n"

        return report

def main():
    import argparse

    parser = argparse.ArgumentParser(description='Scan for PII in codebase')
    parser.add_argument('path', nargs='?', default='.', help='Path to scan')
    parser.add_argument('--json', action='store_true', help='Output JSON')
    parser.add_argument('--exclude', nargs='+', help='Additional directories to exclude')

    args = parser.parse_args()

    detector = PIIDetector()

    if args.exclude:
        detector.exclude_dirs.update(args.exclude)

    scan_path = Path(args.path)
    findings = detector.scan_directory(scan_path)

    if args.json:
        import json
        print(json.dumps(findings, indent=2))
    else:
        print(detector.generate_report(findings))

    # Exit with error code if PII found
    sys.exit(1 if findings else 0)

if __name__ == '__main__':
    main()

Save as pii_detector.py and run:

python pii_detector.py .

JavaScript/TypeScript Version:

// pii-detector.js
const fs = require('fs');
const path = require('path');
const readline = require('readline');

const PII_PATTERNS = {
  'SSN': /\b\d{3}-\d{2}-\d{4}\b/g,
  'Credit Card': /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
  'Email': /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/gi,
  'Phone (US)': /\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g,
  'IPv4': /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g,
  'AWS Key': /AKIA[0-9A-Z]{16}/g,
  'GitHub Token': /ghp_[0-9a-zA-Z]{36}/g,
};

const EXCLUDE_DIRS = new Set([
  'node_modules', '.git', 'dist', 'build', 'coverage',
  '.next', 'out', 'vendor', '__pycache__'
]);

const EXCLUDE_EXTS = new Set([
  '.min.js', '.map', '.lock', '.jpg', '.png', '.gif', '.pdf'
]);

function shouldScanFile(filePath) {
  const parts = filePath.split(path.sep);
  if (parts.some(part => EXCLUDE_DIRS.has(part))) {
    return false;
  }

  const ext = path.extname(filePath);
  if (EXCLUDE_EXTS.has(ext)) {
    return false;
  }

  return true;
}

function maskPII(value) {
  if (value.length <= 4) return '*'.repeat(value.length);
  return value.slice(0, 2) + '*'.repeat(value.length - 4) + value.slice(-2);
}

async function scanFile(filePath) {
  const findings = [];

  const fileStream = fs.createReadStream(filePath);
  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity
  });

  let lineNum = 0;
  for await (const line of rl) {
    lineNum++;

    for (const [type, pattern] of Object.entries(PII_PATTERNS)) {
      const matches = line.matchAll(pattern);

      for (const match of matches) {
        findings.push({
          file: filePath,
          line: lineNum,
          type,
          value: maskPII(match[0]),
          context: line.trim().slice(0, 100)
        });
      }
    }
  }

  return findings;
}

async function scanDirectory(dir) {
  const findings = [];

  async function walk(directory) {
    const files = await fs.promises.readdir(directory);

    for (const file of files) {
      const filePath = path.join(directory, file);
      const stat = await fs.promises.stat(filePath);

      if (stat.isDirectory()) {
        if (!EXCLUDE_DIRS.has(file)) {
          await walk(filePath);
        }
      } else if (shouldScanFile(filePath)) {
        const fileFindings = await scanFile(filePath);
        findings.push(...fileFindings);
      }
    }
  }

  await walk(dir);
  return findings;
}

function generateReport(findings) {
  if (findings.length === 0) {
    return '✅ No PII detected!';
  }

  let report = `⚠️  Found ${findings.length} potential PII instances:\n\n`;

  const byType = {};
  for (const finding of findings) {
    if (!byType[finding.type]) {
      byType[finding.type] = [];
    }
    byType[finding.type].push(finding);
  }

  for (const [type, items] of Object.entries(byType)) {
    report += `## ${type} (${items.length} found)\n\n`;

    for (const item of items.slice(0, 10)) {
      report += `- ${item.file}:${item.line}\n`;
      report += `  Value: ${item.value}\n`;
      report += `  Context: ${item.context}\n\n`;
    }

    if (items.length > 10) {
      report += `  ... and ${items.length - 10} more\n\n`;
    }
  }

  return report;
}

async function main() {
  const scanPath = process.argv[2] || '.';
  const findings = await scanDirectory(scanPath);

  console.log(generateReport(findings));

  process.exit(findings.length > 0 ? 1 : 0);
}

main().catch(console.error);

5. Database Scanning

SQL Query to Find PII:

-- PostgreSQL example
-- Scan for potential email columns
SELECT
    table_name,
    column_name,
    data_type
FROM information_schema.columns
WHERE column_name ILIKE '%email%'
   OR column_name ILIKE '%ssn%'
   OR column_name ILIKE '%phone%'
   OR column_name ILIKE '%address%'
ORDER BY table_name, column_name;

-- Sample data to check patterns
SELECT
    table_name,
    column_name,
    COUNT(*) as sample_count
FROM information_schema.columns
WHERE data_type IN ('character varying', 'text', 'char')
GROUP BY table_name, column_name;

6. Log File Scanning

# Scan application logs
find . -name "*.log" -type f -exec grep -l '\b\d{3}-\d{2}-\d{4}\b' {} \;

# Scan with context
grep -C 3 'SSN\|social security\|credit card' *.log

# Check for leaked credentials
grep -r 'password.*=\|api_key.*=\|secret.*=' . --include="*.log"

7. CI/CD Integration

GitHub Actions:

name: PII Detection

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  pii-scan:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'

      - name: Run PII Detector
        run: |
          python pii_detector.py . --json > pii-report.json

      - name: Upload report
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: pii-report
          path: pii-report.json

      - name: Fail if PII found
        run: |
          if [ $(cat pii-report.json | jq 'length') -gt 0 ]; then
            echo "❌ PII detected! See report for details."
            exit 1
          fi

8. Pre-commit Hook

#!/bin/bash
# .git/hooks/pre-commit

echo "Scanning for PII..."

python pii_detector.py $(git diff --cached --name-only)

if [ $? -ne 0 ]; then
    echo "❌ PII detected in staged files!"
    echo "Please remove sensitive data before committing."
    exit 1
fi

echo "✅ No PII detected"
exit 0

9. Data Anonymization

Once PII is found, suggest anonymization:

# anonymize.py
import hashlib
import re

def anonymize_email(email):
    """Replace email with hashed version."""
    local, domain = email.split('@')
    hashed = hashlib.sha256(local.encode()).hexdigest()[:8]
    return f"{hashed}@{domain}"

def anonymize_ssn(ssn):
    """Mask SSN keeping only last 4 digits."""
    return f"***-**-{ssn[-4:]}"

def anonymize_phone(phone):
    """Mask phone keeping only last 4 digits."""
    digits = re.sub(r'\D', '', phone)
    return f"***-***-{digits[-4:]}"

def anonymize_credit_card(cc):
    """Mask credit card keeping only last 4 digits."""
    return f"****-****-****-{cc[-4:]}"

# Example usage
text = "Contact John at john@email.com or call 555-123-4567"
text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
              lambda m: anonymize_email(m.group()), text)
print(text)
# Output: "Contact John at a1b2c3d4@email.com or call 555-123-4567"

10. Compliance Report

Generate compliance report:

# PII Detection Report

**Date**: 2025-11-01
**Scope**: Entire codebase
**Files Scanned**: 1,247
**Total Findings**: 23

## Summary by Type

| PII Type | Count | Risk Level |
|----------|-------|------------|
| Email    | 12    | Medium     |
| Phone    | 8     | Medium     |
| SSN      | 2     | High       |
| API Keys | 1     | Critical   |

## Critical Findings

### 1. AWS API Key in config file
- **File**: config/production.env
- **Line**: 15
- **Recommendation**: Move to environment variables or secret manager

### 2. SSN in test data
- **File**: tests/fixtures/users.json
- **Line**: 42
- **Recommendation**: Use fake data generator

## Remediation Steps

1. ✅ Remove hardcoded credentials from config files
2. ✅ Replace real PII in test data with fake data
3. ✅ Add pre-commit hooks to prevent future leaks
4. ✅ Rotate exposed API keys
5. ✅ Update .gitignore to exclude sensitive files

## Compliance Status

- [ ] GDPR Article 32 (Security of processing)
- [ ] CCPA Section 1798.150 (Data protection)
- [ ] HIPAA Security Rule (if applicable)

Best Practices

DO:

  • Scan regularly (CI/CD, pre-commit)
  • Use environment variables for secrets
  • Anonymize data in non-production
  • Implement data retention policies
  • Train team on PII handling
  • Use tools like git-secrets, truffleHog

DON'T:

  • Store PII in version control
  • Log sensitive data
  • Hardcode credentials
  • Use real PII in tests
  • Keep PII longer than needed
  • Ignore scan results

Checklist

  • PII patterns defined
  • Scanner script created
  • Codebase scanned
  • Database schema reviewed
  • Logs checked
  • CI/CD integration added
  • Pre-commit hook installed
  • Findings documented
  • Remediation plan created
  • Team trained on PII handling