Claude Code Plugins

Community-maintained marketplace

Feedback

Cleaning Up Research Sessions

@kthorn/research-superpower
2
0

Safely remove intermediate files from completed research sessions while preserving important data

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name Cleaning Up Research Sessions
description Safely remove intermediate files from completed research sessions while preserving important data
when_to_use After research session is complete and consolidated. When research folder has accumulated temporary files. Before archiving or sharing research session.
version 1.0.0

Cleaning Up Research Sessions

Overview

Remove intermediate files created during research workflow while preserving all important data.

Core principle: Conservative cleanup with user confirmation. Never delete anything important.

When to Use

Use this skill when:

  • Research session is complete and consolidated
  • Preparing to archive or share research session folder
  • Research folder has accumulated temporary/intermediate files
  • User explicitly asks to clean up

When NOT to use:

  • Research is still in progress
  • User hasn't reviewed final outputs yet
  • Unsure what files are safe to delete

Files That Are ALWAYS KEPT

NEVER delete these (protected list):

Core outputs:

  • SUMMARY.md - Enhanced findings with methodology
  • relevant-papers.json - Filtered relevant papers
  • papers-reviewed.json - Complete screening history
  • papers/ directory - All PDFs and supplementary files
  • citations/citation-graph.json - Citation relationships

Methodology documentation:

  • screening-criteria.json - Rubric definition (if exists)
  • test-set.json - Rubric validation papers (if exists)
  • abstracts-cache.json - Cached abstracts for re-screening (if exists)
  • rubric-changelog.md - Rubric version history (if exists)

Auxiliary documentation (if exists):

  • README.md - Project overview
  • TOP_PRIORITY_PAPERS.md - Curated priority list
  • evaluated-papers.json - Rich structured data

Project configuration:

  • .claude/ directory - Permissions and settings
  • *.py helper scripts that were created - Keep for reproducibility

Files That May Be Cleaned Up

Candidates for removal (with confirmation):

Intermediate search results:

  • initial-search-results.json - Raw PubMed results before screening
    • Safe to delete: Data is in papers-reviewed.json
    • Reason to keep: Shows raw search results for reproducibility

Temporary files:

  • *.tmp files
  • *.swp files (vim swap files)
  • .DS_Store (macOS)
  • __pycache__/ (Python cache)
  • *.pyc (Python compiled)

Log files:

  • *.log files
  • debug-*.txt files

Cleanup Workflow

Step 1: Analyze Research Session

cd research-sessions/YYYY-MM-DD-description/

# List all files with sizes
find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh

Identify files by category:

  • Core outputs (MUST keep)
  • Methodology files (SHOULD keep)
  • Intermediate files (candidates for cleanup)
  • Temporary files (safe to delete)

Step 2: Present Cleanup Plan to User

Show what will be deleted:

๐Ÿงน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/

Files to KEEP (protected):
  โœ… SUMMARY.md (45 KB)
  โœ… relevant-papers.json (12 KB)
  โœ… papers-reviewed.json (28 KB)
  โœ… papers/ (14 PDFs, 32 MB)
  โœ… citations/citation-graph.json (5 KB)
  โœ… screening-criteria.json (2 KB)
  โœ… abstracts-cache.json (156 KB)

Files that CAN be removed (intermediate):
  ๐Ÿ—‘๏ธ  initial-search-results.json (8 KB) - Raw PubMed results
  ๐Ÿ—‘๏ธ  .DS_Store (6 KB) - macOS metadata

Total space to recover: 14 KB

Proceed with cleanup? (y/n/review)

Options:

  • y - Delete intermediate files
  • n - Cancel cleanup, keep everything
  • review - Show contents of each file before deciding

Step 3: Confirm Deletions

Before deleting ANY file:

  1. Verify it's not in protected list
  2. Check file isn't referenced in SUMMARY.md
  3. Confirm with user one more time

Example confirmation:

About to delete:
- initial-search-results.json (8 KB)

This file contains raw PubMed search results. The data is preserved in
papers-reviewed.json, so this is safe to delete.

Confirm deletion? (y/n)

Step 4: Perform Cleanup

Delete confirmed files:

# Move to trash instead of rm (safer)
# On macOS:
mv initial-search-results.json ~/.Trash/

# On Linux:
mv initial-search-results.json ~/.local/share/Trash/files/

# Or use rm if user confirms
rm initial-search-results.json

Report results:

โœ… Cleanup complete!

Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)

Space recovered: 14 KB

Protected files preserved:
- All 8 core files kept
- All 14 PDFs kept
- All methodology documentation kept

Step 5: Verify Integrity

After cleanup, verify critical files:

# Check core files exist
test -f SUMMARY.md && echo "โœ“ SUMMARY.md"
test -f relevant-papers.json && echo "โœ“ relevant-papers.json"
test -f papers-reviewed.json && echo "โœ“ papers-reviewed.json"
test -d papers && echo "โœ“ papers/ directory"

# Verify JSON files are valid
jq empty relevant-papers.json && echo "โœ“ relevant-papers.json valid JSON"
jq empty papers-reviewed.json && echo "โœ“ papers-reviewed.json valid JSON"

Report to user:

โœ… Integrity check passed
   - All core files present
   - All JSON files valid
   - All PDFs intact

Special Cases

Case 1: Large abstracts-cache.json

If abstracts-cache.json is very large (>100 MB):

โš ๏ธ  abstracts-cache.json is 256 MB

This file enables re-screening if you update the rubric. Options:
1. Keep (recommended if you might refine rubric)
2. Compress (gzip to ~50 MB, can decompress later)
3. Delete (only if research is final and won't be updated)

Choice? (1/2/3)

If user chooses compress:

gzip abstracts-cache.json
# Creates abstracts-cache.json.gz

echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)"

Case 2: Helper Scripts

If user created helper scripts during research:

๐Ÿ“ Found helper scripts:
   - screen_papers.py (created for batch screening)
   - deep_dive_papers.py (created for data extraction)

These scripts document your methodology. Recommendations:
- Keep for reproducibility
- Add comments if not already documented
- Reference in SUMMARY.md under "Reproducibility" section

Keep scripts? (y/n)

Case 3: Multiple Research Sessions

If cleaning up multiple sessions:

# Find all research sessions
find research-sessions/ -maxdepth 1 -type d

# For each session:
for session in research-sessions/*/; do
    echo "Analyzing: $session"
    # Run cleanup analysis
done

Ask user:

Found 5 completed research sessions.

Clean up all sessions? (y/n/select)
- y: Analyze and clean all sessions
- n: Cancel
- select: Choose which sessions to clean

Safety Mechanisms

Protected File List

Maintain hardcoded list of patterns to NEVER delete:

PROTECTED_PATTERNS = [
    'SUMMARY.md',
    'relevant-papers.json',
    'papers-reviewed.json',
    'papers/*.pdf',
    'papers/*.zip',
    'citations/citation-graph.json',
    'screening-criteria.json',
    'test-set.json',
    'abstracts-cache.json',
    'rubric-changelog.md',
    'README.md',
    'TOP_PRIORITY_PAPERS.md',
    'evaluated-papers.json',
    '*.py',  # Helper scripts
    '.claude/*',  # Project settings
]

Before deleting any file:

def is_protected(filepath):
    """Check if file matches any protected pattern"""
    for pattern in PROTECTED_PATTERNS:
        if fnmatch(filepath, pattern):
            return True
    return False

# Never delete protected files
if is_protected(file_to_delete):
    print(f"โš ๏ธ  ERROR: {file_to_delete} is protected and cannot be deleted")
    return

Dry Run Mode

Always show what will be deleted before doing it:

# Dry run (show only, don't delete)
echo "DRY RUN - No files will be deleted"

for file in $candidate_files; do
    if is_safe_to_delete "$file"; then
        echo "Would delete: $file ($(du -h $file | cut -f1))"
    fi
done

echo ""
echo "Proceed with actual deletion? (y/n)"

Integration with Other Skills

After answering-research-questions workflow:

  1. Complete Phase 8 (consolidation)
  2. User reviews SUMMARY.md and relevant-papers.json
  3. Optionally: Run cleaning-up-research-sessions
  4. Archive or share research folder

Add to answering-research-questions Phase 8:

### Optional: Cleanup

After reviewing outputs, optionally clean up intermediate files:

"Research session is complete. Would you like me to clean up intermediate files?
I'll show you what will be deleted before removing anything."

If yes: Use `cleaning-up-research-sessions` skill

Common Mistakes

Deleting papers-reviewed.json: This is the deduplication database - NEVER delete โ†’ Always protect with hardcoded list Deleting abstracts-cache.json: Needed for re-screening โ†’ Ask user, default to keep Deleting helper scripts: Important for reproducibility โ†’ Keep by default, ask if user wants to remove Not showing user what will be deleted: User needs to see the plan โ†’ Always show dry run first Deleting without confirmation: Too risky โ†’ Always ask for final confirmation Not verifying after cleanup: Files could be corrupted โ†’ Run integrity check after deletion

Quick Reference

Action Command
List all files with sizes find . -type f -exec ls -lh {} \; | sort -rh
Check if JSON valid jq empty file.json
Move to trash (macOS) mv file ~/.Trash/
Move to trash (Linux) mv file ~/.local/share/Trash/files/
Compress large file gzip file
Calculate folder size du -sh .

Success Criteria

Cleanup successful when:

  • All protected files still exist and are valid
  • Only confirmed intermediate files deleted
  • User knows what was deleted and why
  • Space recovered is reported
  • Integrity check passed
  • Research session is still fully functional

Example Session

User: "Can you clean up the research session folder?"

Agent: "I'll analyze the research session and show you what can be safely cleaned up."

[Analyzes folder]

Agent: "๐Ÿงน Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/

Files to KEEP (protected):
  โœ… SUMMARY.md (45 KB) - Final findings
  โœ… relevant-papers.json (12 KB) - Filtered results
  โœ… papers-reviewed.json (28 KB) - Complete history
  โœ… papers/ (14 PDFs, 32 MB) - Downloaded papers
  โœ… citations/citation-graph.json (5 KB) - Citation network
  โœ… screening-criteria.json (2 KB) - Rubric definition
  โœ… abstracts-cache.json (156 KB) - Cached abstracts

Files that CAN be removed:
  ๐Ÿ—‘๏ธ  initial-search-results.json (8 KB) - Raw PubMed results
     โ†’ Data is preserved in papers-reviewed.json
  ๐Ÿ—‘๏ธ  .DS_Store (6 KB) - macOS metadata
     โ†’ Not needed

Total space to recover: 14 KB

Proceed with cleanup? (y/n/review)"

User: "y"

Agent: "Final confirmation - About to delete:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)

All protected files will be preserved. Confirm? (y/n)"

User: "y"

Agent: [Deletes files]

"โœ… Cleanup complete!

Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)

Space recovered: 14 KB

Running integrity check...
โœ… All core files intact
โœ… All JSON files valid
โœ… All 14 PDFs present

Research session is ready for archiving or sharing."

Next Steps

After cleanup:

  • Research folder is clean and ready to archive
  • Share folder with collaborators
  • Move to long-term storage
  • Continue with follow-up research if needed