name	managing-imports
description	Organize incoming files and track import status through the file staging workflow

Managing Imports

Overview

Manage import file workflow from drop to archive: organize files, check duplicates, track status, and clear processed files.

Use when: Processing new import files, checking import status, or archiving successfully imported data.

Announce at start: "I'm using the managing-imports skill to [operation]."

What This Skill Does

Wraps the file staging system (scripts/utilities/file_staging/) with user-friendly operations:

Stage files - Classify and organize files from inbox
Check duplicates - Query import_manifest for SHA256 matches
Track status - Show files in each workflow stage
Clear processed - Move imported files to archive

Operations

Stage Files

When: New files dropped in inbox, need to classify and organize for import.

Process:

Verify inbox has files

cd /Users/anthonybyrnes/PycharmProjects/Python419
find imports/inbox -type f | wc -l

If 0 files, report: "Inbox is empty. No files to stage."

Run file staging orchestrator
```
cd /Users/anthonybyrnes/PycharmProjects/Python419
PYTHONPATH=. python3 -m scripts.utilities.file_staging \
  --drop-folder imports/inbox \
  --use-db \
  --generate-plan
```
What this does:
- Scans inbox for files
- Classifies file type (LBHRA, PS_LB, LBSR08E, 419F, PROGRAM_LIST, COTA_PERSISTENCE)
- Detects academic term from file content
- Checks SHA256 against import_manifest for duplicates
- Moves to imports/staged/{type}/
- Generates staging manifest JSON
- Creates import plan script (if --generate-plan)
Parse orchestrator output

Look for:
- Files processed count
- Files staged count
- Files skipped (duplicates)
- Staging manifest path
- Import plan path (if generated)

Report results

Format:

Staging Complete:
- Processed: 15 files
- Staged: 12 files (ready for import)
- Skipped: 3 files (duplicates)

Staged files by type:
- LBHRA: 5 files → imports/staged/lbhra/
- PS_LB: 4 files → imports/staged/ps-lb/
- 419F: 3 files → imports/staged/419f/

Manifest: imports/staged/staging-manifest-YYYY-MM-DD-HHMMSS.json
Import plan: imports/staged/import-plan-YYYY-MM-DD-HHMMSS.sh

Error Handling:

If orchestrator fails, show error and recommend manual classification
If --use-db fails, warn that duplicates weren't checked
If staging folder doesn't exist, create it first

Check Duplicates

When: Want to verify if files have already been imported before staging.

Process:

Get file path or directory

User provides:
- Single file path
- Directory path (checks all files in directory)
Calculate SHA256 hashes

For each file:
```
shasum -a 256 <file_path>
```

Query import_manifest

import psycopg2
from dotenv import load_dotenv
import os

load_dotenv()
conn = psycopg2.connect(
    host=os.getenv('DB_HOST'),
    dbname=os.getenv('DB_NAME'),
    user=os.getenv('DB_USER'),
    password=os.getenv('DB_PASSWORD')
)

cursor = conn.cursor()
cursor.execute("""
    SELECT
        original_path,
        file_type,
        term_code,
        imported_at
    FROM import_manifest
    WHERE file_hash = %s
""", (file_hash,))

result = cursor.fetchone()

Report duplicates

For each duplicate found:

DUPLICATE: filename.txt
- SHA256: abc123...
- Previously imported: 2025-11-10 14:32:15
- Original path: imports/archive/lbhra/2024-fall/filename.txt
- File type: LBHRA
- Term: 2024-fall
- Action: SKIP (already in database)

For non-duplicates:

NEW: filename.txt
- SHA256: def456...
- Not found in import_manifest
- Action: READY FOR IMPORT

SQL Helper:

-- Check multiple files at once
SELECT
    file_hash,
    original_path,
    file_type,
    term_code,
    imported_at
FROM import_manifest
WHERE file_hash IN ('hash1', 'hash2', 'hash3')
ORDER BY imported_at DESC;

Track Status

When: Need overview of import workflow stages, checking what's queued for import.

Process:

Count files in each stage

cd /Users/anthonybyrnes/PycharmProjects/Python419

# Inbox
INBOX=$(find imports/inbox -type f 2>/dev/null | wc -l | tr -d ' ')

# Staged by type
LBHRA=$(find imports/staged/lbhra -type f 2>/dev/null | wc -l | tr -d ' ')
PS_LB=$(find imports/staged/ps-lb -type f 2>/dev/null | wc -l | tr -d ' ')
LBSR08E=$(find imports/staged/lbsr08e -type f 2>/dev/null | wc -l | tr -d ' ')
F419=$(find imports/staged/419f -type f 2>/dev/null | wc -l | tr -d ' ')
PROGRAM=$(find imports/staged/program-list -type f 2>/dev/null | wc -l | tr -d ' ')
COTA=$(find imports/staged/cota-persistence -type f 2>/dev/null | wc -l | tr -d ' ')

# Processing
PROCESSING=$(find imports/processing -type f 2>/dev/null | wc -l | tr -d ' ')

# Archive (total)
ARCHIVE=$(find imports/archive -type f 2>/dev/null | wc -l | tr -d ' ')

Detect terms for staged files

For each type with staged files, group by term:

# Example: LBHRA files by term
for file in imports/staged/lbhra/*.txt; do
    # Run term detector
    TERM=$(PYTHONPATH=. python3 -c "
from scripts.utilities.file_staging.term_detector import detect_term
result = detect_term('$file')
print(result.get('term_code', 'unknown'))
    ")
    echo "$file -> $TERM"
done

Count by term for display.

Format output

Import Status:

Inbox: 0 files

Staged: 13 files ready for import
  - LBHRA (5 files):
    - 2024-fall: 2 files
    - 2025-spring: 3 files
  - PS_LB (4 files):
    - 2024-fall: 1 file
    - 2025-spring: 3 files
  - 419F (3 files):
    - 2025-spring: 3 files
  - LBSR08E (0 files)
  - Program List (0 files)
  - COTA Persistence (1 file):
    - 2024-fall: 1 file

Processing: 0 files

Archive: 1,247 files across 6 types

Show recent imports from manifest

SELECT
    file_type,
    term_code,
    COUNT(*) as file_count,
    MAX(imported_at) as last_import
FROM import_manifest
GROUP BY file_type, term_code
ORDER BY last_import DESC
LIMIT 10;

Display as:

Recent Imports (from manifest):
- LBHRA 2025-spring: 3 files (last: 2025-11-13 10:45:22)
- PS_LB 2025-spring: 3 files (last: 2025-11-13 10:47:15)
- 419F 2024-fall: 5 files (last: 2025-11-12 16:23:01)

Error Handling:

If directories don't exist, show "0 files" (not errors)
If database unavailable, show file counts only (skip manifest query)
If term detection fails for a file, mark as "term: unknown"

Clear Processed

When: Import succeeded, files verified in database, ready to move to archive.

Process:

Verify files are in manifest

SELECT
    file_hash,
    original_path,
    file_type,
    term_code
FROM import_manifest
WHERE original_path LIKE '%staged%'
ORDER BY imported_at DESC;

These are successfully imported files still in staging.

Determine archive destination

For each file:
- File type: LBHRA, PS_LB, etc.
- Term code: 2024-fall, 2025-spring, etc.
- Archive path: imports/archive/{type}/{term}/
Example:
- File: imports/staged/lbhra/LBHRA_Report_Fall2024.txt
- Type: lbhra
- Term: 2024-fall
- Destination: imports/archive/lbhra/2024-fall/
Create archive directories if needed
```
mkdir -p imports/archive/{type}/{term}
```

Move files

mv imports/staged/{type}/{filename} imports/archive/{type}/{term}/

Update manifest (optional)

If tracking archive paths:

UPDATE import_manifest
SET original_path = %s
WHERE file_hash = %s;

Report results

Cleared Processed Files:

Moved to archive:
- LBHRA (5 files):
  - 2024-fall: 2 files → imports/archive/lbhra/2024-fall/
  - 2025-spring: 3 files → imports/archive/lbhra/2025-spring/
- PS_LB (4 files):
  - 2025-spring: 4 files → imports/archive/ps-lb/2025-spring/

Total: 9 files archived

Remaining staged: 4 files

Safety Checks:

NEVER move files not in manifest (risk of losing unimported data)
Verify SHA256 matches before moving
Confirm database has imported records (check record counts)
Keep staging files until import verified complete

Error Handling:

If manifest query fails, ABORT (don't move files)
If term unknown, archive to imports/archive/{type}/unknown-term/
If move fails, log error but continue with other files
Report any failed moves separately

Integration

Wraps:

File staging system: scripts/utilities/file_staging/orchestrator.py
Term detector: scripts/utilities/file_staging/term_detector.py
Duplicate checker: scripts/utilities/file_staging/duplicate_checker.py

Database:

Table: import_manifest (SHA256 tracking)
Connection: Uses .env credentials

Used by:

project-status skill (shows import queue)
Import workflows (pre-import organization)

Depends on:

/imports/ directory structure
File staging system installed
Database connection available (for duplicates)

Implementation Details

File Type Patterns:

FILE_TYPE_PATTERNS = {
    'LBHRA': r'LBHRA.*\.txt$',
    'PS_LB': r'PS_LB.*\.txt$',
    'LBSR08E': r'LBSR08E.*\.txt$',
    '419F': r'419F.*\.txt$',
    'PROGRAM_LIST': r'(?i)program.*list.*\.xlsx?$',
    'COTA_PERSISTENCE': r'(?i)(ta|ga|isa).*\.txt$',
}

Term Code Mapping:

TERM_CODES = {
    'fall': 'YYYY-fall',
    'spring': 'YYYY-spring',
    'summer': 'YYYY-summer',
    'winter': 'YYYY-winter',
}

Database Schema:

CREATE TABLE import_manifest (
    id SERIAL PRIMARY KEY,
    file_hash VARCHAR(64) UNIQUE NOT NULL,
    original_path TEXT NOT NULL,
    file_type VARCHAR(50),
    term_code VARCHAR(20),
    imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    file_size BIGINT,
    record_count INTEGER
);

Usage Examples

Example 1: Stage New Files

User: "I have new import files in the inbox. Can you stage them?"

Assistant: "I'm using the managing-imports skill to stage files from inbox."

[Runs staging operation]

Output:

Staging Complete:
- Processed: 8 files
- Staged: 8 files (ready for import)
- Skipped: 0 files (no duplicates)

Staged files by type:
- LBHRA: 3 files → imports/staged/lbhra/
- PS_LB: 3 files → imports/staged/ps-lb/
- 419F: 2 files → imports/staged/419f/

Manifest: imports/staged/staging-manifest-2025-11-14-143022.json
Import plan: imports/staged/import-plan-2025-11-14-143022.sh

Next step: Review import plan and execute imports.

Example 2: Check Import Status

User: "What's in the import queue?"

Assistant: "I'm using the managing-imports skill to track import status."

[Runs track status operation]

Output:

Import Status:

Inbox: 0 files

Staged: 8 files ready for import
  - LBHRA (3 files):
    - 2025-spring: 3 files
  - PS_LB (3 files):
    - 2025-spring: 3 files
  - 419F (2 files):
    - 2025-spring: 2 files

Processing: 0 files

Archive: 1,255 files across 6 types

Recent Imports (from manifest):
- PS_LB 2025-spring: 3 files (last: 2025-11-14 10:15:33)
- LBHRA 2025-spring: 3 files (last: 2025-11-14 10:12:45)
- 419F 2024-fall: 5 files (last: 2025-11-13 15:22:11)

Example 3: Check for Duplicates

User: "Check if these files are duplicates before I import them."

Assistant: "I'm using the managing-imports skill to check for duplicates."

[Runs duplicate check on inbox]

Output:

Duplicate Check Results:

DUPLICATE: LBHRA_Spring2025_v2.txt
- SHA256: abc123def456...
- Previously imported: 2025-11-10 14:32:15
- Original path: imports/archive/lbhra/2025-spring/LBHRA_Spring2025.txt
- File type: LBHRA
- Term: 2025-spring
- Action: SKIP (already in database)

NEW: PS_LB_Spring2025_Updated.txt
- SHA256: 789xyz012abc...
- Not found in import_manifest
- Action: READY FOR IMPORT

Summary:
- Duplicates: 1 file (skip)
- New files: 1 file (ready to import)

Example 4: Archive Processed Files

User: "Import completed successfully. Archive the processed files."

Assistant: "I'm using the managing-imports skill to clear processed files."

[Runs clear processed operation]

Output:

Cleared Processed Files:

Moved to archive:
- LBHRA (3 files):
  - 2025-spring: 3 files → imports/archive/lbhra/2025-spring/
- PS_LB (3 files):
  - 2025-spring: 3 files → imports/archive/ps-lb/2025-spring/
- 419F (2 files):
  - 2025-spring: 2 files → imports/archive/419f/2025-spring/

Total: 8 files archived

Remaining staged: 0 files

All staged files have been archived.

Common Mistakes

Staging without database connection:

Problem: Can't check duplicates, might re-import files
Fix: Always use --use-db flag when staging

Moving files before import verified:

Problem: Lose track of files if import fails
Fix: Only clear processed after verifying database records

Not detecting terms correctly:

Problem: Files archived to wrong term directory
Fix: Verify term detection output, use term detector directly if needed

Archiving files not in manifest:

Problem: Risk losing unimported data
Fix: Always check manifest first, never move untracked files

Red Flags

Never:

Stage files without checking duplicates (use --use-db)
Move files from staging without manifest verification
Delete files from inbox (move to staged instead)
Archive files before import completes
Skip error handling (always check command exit codes)

Always:

Verify database connection before duplicate checks
Create archive directories before moving files
Log all file operations (staging manifest tracks this)
Report counts after each operation (files processed, staged, skipped)
Keep import plan scripts for audit trail

Workflow Summary

1. New files arrive
   ↓
   Drop in: /imports/inbox/

2. Stage files (this skill)
   ↓
   Run: managing-imports → stage files
   ↓
   Result: /imports/staged/{type}/

3. Execute imports (external scripts)
   ↓
   Run: Import plan scripts
   ↓
   Update: import_manifest table

4. Clear processed (this skill)
   ↓
   Run: managing-imports → clear processed
   ↓
   Result: /imports/archive/{type}/{term}/

Complete Lifecycle:

inbox → staged → processing → archive
         ↑
         └─ duplicates skipped (from import_manifest check)

managing-imports

Install Skill

SKILL.md

Managing Imports

Overview

What This Skill Does

Operations

Stage Files

Check Duplicates

Track Status

Clear Processed

Integration

Implementation Details

Usage Examples

Example 1: Stage New Files

Example 2: Check Import Status

Example 3: Check for Duplicates

Example 4: Archive Processed Files

Common Mistakes

Red Flags

Workflow Summary