| name | Cleaning Up Research Sessions |
| description | Safely remove intermediate files from completed research sessions while preserving important data |
| when_to_use | After research session is complete and consolidated. When research folder has accumulated temporary files. Before archiving or sharing research session. |
| version | 1.0.0 |
Cleaning Up Research Sessions
Overview
Remove intermediate files created during research workflow while preserving all important data.
Core principle: Conservative cleanup with user confirmation. Never delete anything important.
When to Use
Use this skill when:
- Research session is complete and consolidated
- Preparing to archive or share research session folder
- Research folder has accumulated temporary/intermediate files
- User explicitly asks to clean up
When NOT to use:
- Research is still in progress
- User hasn't reviewed final outputs yet
- Unsure what files are safe to delete
Files That Are ALWAYS KEPT
NEVER delete these (protected list):
Core outputs:
SUMMARY.md- Enhanced findings with methodologyrelevant-papers.json- Filtered relevant paperspapers-reviewed.json- Complete screening historypapers/directory - All PDFs and supplementary filescitations/citation-graph.json- Citation relationships
Methodology documentation:
screening-criteria.json- Rubric definition (if exists)test-set.json- Rubric validation papers (if exists)abstracts-cache.json- Cached abstracts for re-screening (if exists)rubric-changelog.md- Rubric version history (if exists)
Auxiliary documentation (if exists):
README.md- Project overviewTOP_PRIORITY_PAPERS.md- Curated priority listevaluated-papers.json- Rich structured data
Project configuration:
.claude/directory - Permissions and settings*.pyhelper scripts that were created - Keep for reproducibility
Files That May Be Cleaned Up
Candidates for removal (with confirmation):
Intermediate search results:
initial-search-results.json- Raw PubMed results before screening- Safe to delete: Data is in papers-reviewed.json
- Reason to keep: Shows raw search results for reproducibility
Temporary files:
*.tmpfiles*.swpfiles (vim swap files).DS_Store(macOS)__pycache__/(Python cache)*.pyc(Python compiled)
Log files:
*.logfilesdebug-*.txtfiles
Cleanup Workflow
Step 1: Analyze Research Session
cd research-sessions/YYYY-MM-DD-description/
# List all files with sizes
find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh
Identify files by category:
- Core outputs (MUST keep)
- Methodology files (SHOULD keep)
- Intermediate files (candidates for cleanup)
- Temporary files (safe to delete)
Step 2: Present Cleanup Plan to User
Show what will be deleted:
๐งน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
โ
SUMMARY.md (45 KB)
โ
relevant-papers.json (12 KB)
โ
papers-reviewed.json (28 KB)
โ
papers/ (14 PDFs, 32 MB)
โ
citations/citation-graph.json (5 KB)
โ
screening-criteria.json (2 KB)
โ
abstracts-cache.json (156 KB)
Files that CAN be removed (intermediate):
๐๏ธ initial-search-results.json (8 KB) - Raw PubMed results
๐๏ธ .DS_Store (6 KB) - macOS metadata
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)
Options:
y- Delete intermediate filesn- Cancel cleanup, keep everythingreview- Show contents of each file before deciding
Step 3: Confirm Deletions
Before deleting ANY file:
- Verify it's not in protected list
- Check file isn't referenced in SUMMARY.md
- Confirm with user one more time
Example confirmation:
About to delete:
- initial-search-results.json (8 KB)
This file contains raw PubMed search results. The data is preserved in
papers-reviewed.json, so this is safe to delete.
Confirm deletion? (y/n)
Step 4: Perform Cleanup
Delete confirmed files:
# Move to trash instead of rm (safer)
# On macOS:
mv initial-search-results.json ~/.Trash/
# On Linux:
mv initial-search-results.json ~/.local/share/Trash/files/
# Or use rm if user confirms
rm initial-search-results.json
Report results:
โ
Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Protected files preserved:
- All 8 core files kept
- All 14 PDFs kept
- All methodology documentation kept
Step 5: Verify Integrity
After cleanup, verify critical files:
# Check core files exist
test -f SUMMARY.md && echo "โ SUMMARY.md"
test -f relevant-papers.json && echo "โ relevant-papers.json"
test -f papers-reviewed.json && echo "โ papers-reviewed.json"
test -d papers && echo "โ papers/ directory"
# Verify JSON files are valid
jq empty relevant-papers.json && echo "โ relevant-papers.json valid JSON"
jq empty papers-reviewed.json && echo "โ papers-reviewed.json valid JSON"
Report to user:
โ
Integrity check passed
- All core files present
- All JSON files valid
- All PDFs intact
Special Cases
Case 1: Large abstracts-cache.json
If abstracts-cache.json is very large (>100 MB):
โ ๏ธ abstracts-cache.json is 256 MB
This file enables re-screening if you update the rubric. Options:
1. Keep (recommended if you might refine rubric)
2. Compress (gzip to ~50 MB, can decompress later)
3. Delete (only if research is final and won't be updated)
Choice? (1/2/3)
If user chooses compress:
gzip abstracts-cache.json
# Creates abstracts-cache.json.gz
echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)"
Case 2: Helper Scripts
If user created helper scripts during research:
๐ Found helper scripts:
- screen_papers.py (created for batch screening)
- deep_dive_papers.py (created for data extraction)
These scripts document your methodology. Recommendations:
- Keep for reproducibility
- Add comments if not already documented
- Reference in SUMMARY.md under "Reproducibility" section
Keep scripts? (y/n)
Case 3: Multiple Research Sessions
If cleaning up multiple sessions:
# Find all research sessions
find research-sessions/ -maxdepth 1 -type d
# For each session:
for session in research-sessions/*/; do
echo "Analyzing: $session"
# Run cleanup analysis
done
Ask user:
Found 5 completed research sessions.
Clean up all sessions? (y/n/select)
- y: Analyze and clean all sessions
- n: Cancel
- select: Choose which sessions to clean
Safety Mechanisms
Protected File List
Maintain hardcoded list of patterns to NEVER delete:
PROTECTED_PATTERNS = [
'SUMMARY.md',
'relevant-papers.json',
'papers-reviewed.json',
'papers/*.pdf',
'papers/*.zip',
'citations/citation-graph.json',
'screening-criteria.json',
'test-set.json',
'abstracts-cache.json',
'rubric-changelog.md',
'README.md',
'TOP_PRIORITY_PAPERS.md',
'evaluated-papers.json',
'*.py', # Helper scripts
'.claude/*', # Project settings
]
Before deleting any file:
def is_protected(filepath):
"""Check if file matches any protected pattern"""
for pattern in PROTECTED_PATTERNS:
if fnmatch(filepath, pattern):
return True
return False
# Never delete protected files
if is_protected(file_to_delete):
print(f"โ ๏ธ ERROR: {file_to_delete} is protected and cannot be deleted")
return
Dry Run Mode
Always show what will be deleted before doing it:
# Dry run (show only, don't delete)
echo "DRY RUN - No files will be deleted"
for file in $candidate_files; do
if is_safe_to_delete "$file"; then
echo "Would delete: $file ($(du -h $file | cut -f1))"
fi
done
echo ""
echo "Proceed with actual deletion? (y/n)"
Integration with Other Skills
After answering-research-questions workflow:
- Complete Phase 8 (consolidation)
- User reviews SUMMARY.md and relevant-papers.json
- Optionally: Run cleaning-up-research-sessions
- Archive or share research folder
Add to answering-research-questions Phase 8:
### Optional: Cleanup
After reviewing outputs, optionally clean up intermediate files:
"Research session is complete. Would you like me to clean up intermediate files?
I'll show you what will be deleted before removing anything."
If yes: Use `cleaning-up-research-sessions` skill
Common Mistakes
Deleting papers-reviewed.json: This is the deduplication database - NEVER delete โ Always protect with hardcoded list Deleting abstracts-cache.json: Needed for re-screening โ Ask user, default to keep Deleting helper scripts: Important for reproducibility โ Keep by default, ask if user wants to remove Not showing user what will be deleted: User needs to see the plan โ Always show dry run first Deleting without confirmation: Too risky โ Always ask for final confirmation Not verifying after cleanup: Files could be corrupted โ Run integrity check after deletion
Quick Reference
| Action | Command |
|---|---|
| List all files with sizes | find . -type f -exec ls -lh {} \; | sort -rh |
| Check if JSON valid | jq empty file.json |
| Move to trash (macOS) | mv file ~/.Trash/ |
| Move to trash (Linux) | mv file ~/.local/share/Trash/files/ |
| Compress large file | gzip file |
| Calculate folder size | du -sh . |
Success Criteria
Cleanup successful when:
- All protected files still exist and are valid
- Only confirmed intermediate files deleted
- User knows what was deleted and why
- Space recovered is reported
- Integrity check passed
- Research session is still fully functional
Example Session
User: "Can you clean up the research session folder?"
Agent: "I'll analyze the research session and show you what can be safely cleaned up."
[Analyzes folder]
Agent: "๐งน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
โ
SUMMARY.md (45 KB) - Final findings
โ
relevant-papers.json (12 KB) - Filtered results
โ
papers-reviewed.json (28 KB) - Complete history
โ
papers/ (14 PDFs, 32 MB) - Downloaded papers
โ
citations/citation-graph.json (5 KB) - Citation network
โ
screening-criteria.json (2 KB) - Rubric definition
โ
abstracts-cache.json (156 KB) - Cached abstracts
Files that CAN be removed:
๐๏ธ initial-search-results.json (8 KB) - Raw PubMed results
โ Data is preserved in papers-reviewed.json
๐๏ธ .DS_Store (6 KB) - macOS metadata
โ Not needed
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)"
User: "y"
Agent: "Final confirmation - About to delete:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
All protected files will be preserved. Confirm? (y/n)"
User: "y"
Agent: [Deletes files]
"โ
Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Running integrity check...
โ
All core files intact
โ
All JSON files valid
โ
All 14 PDFs present
Research session is ready for archiving or sharing."
Next Steps
After cleanup:
- Research folder is clean and ready to archive
- Share folder with collaborators
- Move to long-term storage
- Continue with follow-up research if needed