| name | managing-imports |
| description | Organize incoming files and track import status through the file staging workflow |
Managing Imports
Overview
Manage import file workflow from drop to archive: organize files, check duplicates, track status, and clear processed files.
Use when: Processing new import files, checking import status, or archiving successfully imported data.
Announce at start: "I'm using the managing-imports skill to [operation]."
What This Skill Does
Wraps the file staging system (scripts/utilities/file_staging/) with user-friendly operations:
- Stage files - Classify and organize files from inbox
- Check duplicates - Query import_manifest for SHA256 matches
- Track status - Show files in each workflow stage
- Clear processed - Move imported files to archive
Operations
Stage Files
When: New files dropped in inbox, need to classify and organize for import.
Process:
Verify inbox has files
cd /Users/anthonybyrnes/PycharmProjects/Python419 find imports/inbox -type f | wc -lIf 0 files, report: "Inbox is empty. No files to stage."
Run file staging orchestrator
cd /Users/anthonybyrnes/PycharmProjects/Python419 PYTHONPATH=. python3 -m scripts.utilities.file_staging \ --drop-folder imports/inbox \ --use-db \ --generate-planWhat this does:
- Scans inbox for files
- Classifies file type (LBHRA, PS_LB, LBSR08E, 419F, PROGRAM_LIST, COTA_PERSISTENCE)
- Detects academic term from file content
- Checks SHA256 against import_manifest for duplicates
- Moves to
imports/staged/{type}/ - Generates staging manifest JSON
- Creates import plan script (if --generate-plan)
Parse orchestrator output
Look for:
- Files processed count
- Files staged count
- Files skipped (duplicates)
- Staging manifest path
- Import plan path (if generated)
Report results
Format:
Staging Complete: - Processed: 15 files - Staged: 12 files (ready for import) - Skipped: 3 files (duplicates) Staged files by type: - LBHRA: 5 files → imports/staged/lbhra/ - PS_LB: 4 files → imports/staged/ps-lb/ - 419F: 3 files → imports/staged/419f/ Manifest: imports/staged/staging-manifest-YYYY-MM-DD-HHMMSS.json Import plan: imports/staged/import-plan-YYYY-MM-DD-HHMMSS.sh
Error Handling:
- If orchestrator fails, show error and recommend manual classification
- If --use-db fails, warn that duplicates weren't checked
- If staging folder doesn't exist, create it first
Check Duplicates
When: Want to verify if files have already been imported before staging.
Process:
Get file path or directory
User provides:
- Single file path
- Directory path (checks all files in directory)
Calculate SHA256 hashes
For each file:
shasum -a 256 <file_path>Query import_manifest
import psycopg2 from dotenv import load_dotenv import os load_dotenv() conn = psycopg2.connect( host=os.getenv('DB_HOST'), dbname=os.getenv('DB_NAME'), user=os.getenv('DB_USER'), password=os.getenv('DB_PASSWORD') ) cursor = conn.cursor() cursor.execute(""" SELECT original_path, file_type, term_code, imported_at FROM import_manifest WHERE file_hash = %s """, (file_hash,)) result = cursor.fetchone()Report duplicates
For each duplicate found:
DUPLICATE: filename.txt - SHA256: abc123... - Previously imported: 2025-11-10 14:32:15 - Original path: imports/archive/lbhra/2024-fall/filename.txt - File type: LBHRA - Term: 2024-fall - Action: SKIP (already in database)For non-duplicates:
NEW: filename.txt - SHA256: def456... - Not found in import_manifest - Action: READY FOR IMPORT
SQL Helper:
-- Check multiple files at once
SELECT
file_hash,
original_path,
file_type,
term_code,
imported_at
FROM import_manifest
WHERE file_hash IN ('hash1', 'hash2', 'hash3')
ORDER BY imported_at DESC;
Track Status
When: Need overview of import workflow stages, checking what's queued for import.
Process:
Count files in each stage
cd /Users/anthonybyrnes/PycharmProjects/Python419 # Inbox INBOX=$(find imports/inbox -type f 2>/dev/null | wc -l | tr -d ' ') # Staged by type LBHRA=$(find imports/staged/lbhra -type f 2>/dev/null | wc -l | tr -d ' ') PS_LB=$(find imports/staged/ps-lb -type f 2>/dev/null | wc -l | tr -d ' ') LBSR08E=$(find imports/staged/lbsr08e -type f 2>/dev/null | wc -l | tr -d ' ') F419=$(find imports/staged/419f -type f 2>/dev/null | wc -l | tr -d ' ') PROGRAM=$(find imports/staged/program-list -type f 2>/dev/null | wc -l | tr -d ' ') COTA=$(find imports/staged/cota-persistence -type f 2>/dev/null | wc -l | tr -d ' ') # Processing PROCESSING=$(find imports/processing -type f 2>/dev/null | wc -l | tr -d ' ') # Archive (total) ARCHIVE=$(find imports/archive -type f 2>/dev/null | wc -l | tr -d ' ')Detect terms for staged files
For each type with staged files, group by term:
# Example: LBHRA files by term for file in imports/staged/lbhra/*.txt; do # Run term detector TERM=$(PYTHONPATH=. python3 -c " from scripts.utilities.file_staging.term_detector import detect_term result = detect_term('$file') print(result.get('term_code', 'unknown')) ") echo "$file -> $TERM" doneCount by term for display.
Format output
Import Status: Inbox: 0 files Staged: 13 files ready for import - LBHRA (5 files): - 2024-fall: 2 files - 2025-spring: 3 files - PS_LB (4 files): - 2024-fall: 1 file - 2025-spring: 3 files - 419F (3 files): - 2025-spring: 3 files - LBSR08E (0 files) - Program List (0 files) - COTA Persistence (1 file): - 2024-fall: 1 file Processing: 0 files Archive: 1,247 files across 6 typesShow recent imports from manifest
SELECT file_type, term_code, COUNT(*) as file_count, MAX(imported_at) as last_import FROM import_manifest GROUP BY file_type, term_code ORDER BY last_import DESC LIMIT 10;Display as:
Recent Imports (from manifest): - LBHRA 2025-spring: 3 files (last: 2025-11-13 10:45:22) - PS_LB 2025-spring: 3 files (last: 2025-11-13 10:47:15) - 419F 2024-fall: 5 files (last: 2025-11-12 16:23:01)
Error Handling:
- If directories don't exist, show "0 files" (not errors)
- If database unavailable, show file counts only (skip manifest query)
- If term detection fails for a file, mark as "term: unknown"
Clear Processed
When: Import succeeded, files verified in database, ready to move to archive.
Process:
Verify files are in manifest
SELECT file_hash, original_path, file_type, term_code FROM import_manifest WHERE original_path LIKE '%staged%' ORDER BY imported_at DESC;These are successfully imported files still in staging.
Determine archive destination
For each file:
- File type:
LBHRA,PS_LB, etc. - Term code:
2024-fall,2025-spring, etc. - Archive path:
imports/archive/{type}/{term}/
Example:
- File:
imports/staged/lbhra/LBHRA_Report_Fall2024.txt - Type:
lbhra - Term:
2024-fall - Destination:
imports/archive/lbhra/2024-fall/
- File type:
Create archive directories if needed
mkdir -p imports/archive/{type}/{term}Move files
mv imports/staged/{type}/{filename} imports/archive/{type}/{term}/Update manifest (optional)
If tracking archive paths:
UPDATE import_manifest SET original_path = %s WHERE file_hash = %s;Report results
Cleared Processed Files: Moved to archive: - LBHRA (5 files): - 2024-fall: 2 files → imports/archive/lbhra/2024-fall/ - 2025-spring: 3 files → imports/archive/lbhra/2025-spring/ - PS_LB (4 files): - 2025-spring: 4 files → imports/archive/ps-lb/2025-spring/ Total: 9 files archived Remaining staged: 4 files
Safety Checks:
- NEVER move files not in manifest (risk of losing unimported data)
- Verify SHA256 matches before moving
- Confirm database has imported records (check record counts)
- Keep staging files until import verified complete
Error Handling:
- If manifest query fails, ABORT (don't move files)
- If term unknown, archive to
imports/archive/{type}/unknown-term/ - If move fails, log error but continue with other files
- Report any failed moves separately
Integration
Wraps:
- File staging system:
scripts/utilities/file_staging/orchestrator.py - Term detector:
scripts/utilities/file_staging/term_detector.py - Duplicate checker:
scripts/utilities/file_staging/duplicate_checker.py
Database:
- Table:
import_manifest(SHA256 tracking) - Connection: Uses
.envcredentials
Used by:
project-statusskill (shows import queue)- Import workflows (pre-import organization)
Depends on:
/imports/directory structure- File staging system installed
- Database connection available (for duplicates)
Implementation Details
File Type Patterns:
FILE_TYPE_PATTERNS = {
'LBHRA': r'LBHRA.*\.txt$',
'PS_LB': r'PS_LB.*\.txt$',
'LBSR08E': r'LBSR08E.*\.txt$',
'419F': r'419F.*\.txt$',
'PROGRAM_LIST': r'(?i)program.*list.*\.xlsx?$',
'COTA_PERSISTENCE': r'(?i)(ta|ga|isa).*\.txt$',
}
Term Code Mapping:
TERM_CODES = {
'fall': 'YYYY-fall',
'spring': 'YYYY-spring',
'summer': 'YYYY-summer',
'winter': 'YYYY-winter',
}
Database Schema:
CREATE TABLE import_manifest (
id SERIAL PRIMARY KEY,
file_hash VARCHAR(64) UNIQUE NOT NULL,
original_path TEXT NOT NULL,
file_type VARCHAR(50),
term_code VARCHAR(20),
imported_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
file_size BIGINT,
record_count INTEGER
);
Usage Examples
Example 1: Stage New Files
User: "I have new import files in the inbox. Can you stage them?"
Assistant: "I'm using the managing-imports skill to stage files from inbox."
[Runs staging operation]
Output:
Staging Complete:
- Processed: 8 files
- Staged: 8 files (ready for import)
- Skipped: 0 files (no duplicates)
Staged files by type:
- LBHRA: 3 files → imports/staged/lbhra/
- PS_LB: 3 files → imports/staged/ps-lb/
- 419F: 2 files → imports/staged/419f/
Manifest: imports/staged/staging-manifest-2025-11-14-143022.json
Import plan: imports/staged/import-plan-2025-11-14-143022.sh
Next step: Review import plan and execute imports.
Example 2: Check Import Status
User: "What's in the import queue?"
Assistant: "I'm using the managing-imports skill to track import status."
[Runs track status operation]
Output:
Import Status:
Inbox: 0 files
Staged: 8 files ready for import
- LBHRA (3 files):
- 2025-spring: 3 files
- PS_LB (3 files):
- 2025-spring: 3 files
- 419F (2 files):
- 2025-spring: 2 files
Processing: 0 files
Archive: 1,255 files across 6 types
Recent Imports (from manifest):
- PS_LB 2025-spring: 3 files (last: 2025-11-14 10:15:33)
- LBHRA 2025-spring: 3 files (last: 2025-11-14 10:12:45)
- 419F 2024-fall: 5 files (last: 2025-11-13 15:22:11)
Example 3: Check for Duplicates
User: "Check if these files are duplicates before I import them."
Assistant: "I'm using the managing-imports skill to check for duplicates."
[Runs duplicate check on inbox]
Output:
Duplicate Check Results:
DUPLICATE: LBHRA_Spring2025_v2.txt
- SHA256: abc123def456...
- Previously imported: 2025-11-10 14:32:15
- Original path: imports/archive/lbhra/2025-spring/LBHRA_Spring2025.txt
- File type: LBHRA
- Term: 2025-spring
- Action: SKIP (already in database)
NEW: PS_LB_Spring2025_Updated.txt
- SHA256: 789xyz012abc...
- Not found in import_manifest
- Action: READY FOR IMPORT
Summary:
- Duplicates: 1 file (skip)
- New files: 1 file (ready to import)
Example 4: Archive Processed Files
User: "Import completed successfully. Archive the processed files."
Assistant: "I'm using the managing-imports skill to clear processed files."
[Runs clear processed operation]
Output:
Cleared Processed Files:
Moved to archive:
- LBHRA (3 files):
- 2025-spring: 3 files → imports/archive/lbhra/2025-spring/
- PS_LB (3 files):
- 2025-spring: 3 files → imports/archive/ps-lb/2025-spring/
- 419F (2 files):
- 2025-spring: 2 files → imports/archive/419f/2025-spring/
Total: 8 files archived
Remaining staged: 0 files
All staged files have been archived.
Common Mistakes
Staging without database connection:
- Problem: Can't check duplicates, might re-import files
- Fix: Always use
--use-dbflag when staging
Moving files before import verified:
- Problem: Lose track of files if import fails
- Fix: Only clear processed after verifying database records
Not detecting terms correctly:
- Problem: Files archived to wrong term directory
- Fix: Verify term detection output, use term detector directly if needed
Archiving files not in manifest:
- Problem: Risk losing unimported data
- Fix: Always check manifest first, never move untracked files
Red Flags
Never:
- Stage files without checking duplicates (use --use-db)
- Move files from staging without manifest verification
- Delete files from inbox (move to staged instead)
- Archive files before import completes
- Skip error handling (always check command exit codes)
Always:
- Verify database connection before duplicate checks
- Create archive directories before moving files
- Log all file operations (staging manifest tracks this)
- Report counts after each operation (files processed, staged, skipped)
- Keep import plan scripts for audit trail
Workflow Summary
1. New files arrive
↓
Drop in: /imports/inbox/
2. Stage files (this skill)
↓
Run: managing-imports → stage files
↓
Result: /imports/staged/{type}/
3. Execute imports (external scripts)
↓
Run: Import plan scripts
↓
Update: import_manifest table
4. Clear processed (this skill)
↓
Run: managing-imports → clear processed
↓
Result: /imports/archive/{type}/{term}/
Complete Lifecycle:
inbox → staged → processing → archive
↑
└─ duplicates skipped (from import_manifest check)