Claude Code Plugins

Community-maintained marketplace

Feedback

Audit, validate, and fix ONT experiment registry entries with comprehensive metadata extraction.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name registry-scrutinize
description Audit, validate, and fix ONT experiment registry entries with comprehensive metadata extraction.

Registry Scrutinize Skill v1.0

Deep validation and enrichment system for ONT experiment registry entries.

Purpose

This skill provides comprehensive scrutiny of registry entries:

  • Validate entries against strict metadata standards
  • Enrich entries by extracting metadata from names, paths, and source data
  • Re-analyze experiments from source data (local files or S3)
  • Audit all changes with detailed logging
  • Report on registry health and completeness

Quick Start

# Full audit of registry with report
/registry-scrutinize audit

# Scrutinize and fix a single experiment
/registry-scrutinize fix exp-abc123

# Enrich all experiments with extracted metadata
/registry-scrutinize enrich --all

# Re-analyze experiments from source data
/registry-scrutinize reanalyze exp-abc123

# Generate comprehensive health report
/registry-scrutinize report --output health_report.html

# Batch scrutinize incomplete entries
/registry-scrutinize batch --incomplete --fix

Commands

audit - Registry Health Audit

Comprehensive audit showing all issues and suggestions:

registry_scrutinize.py audit [--json output.json] [--verbose]

# Output includes:
# - Validation summary (pass/fail/warning counts)
# - Missing fields by category
# - Metadata extraction opportunities
# - Duplicate detection
# - Stale entry detection

fix - Fix Single Experiment

Scrutinize and fix a single experiment:

registry_scrutinize.py fix <exp_id> [--dry-run] [--force-reanalyze]

# Actions:
# - Validate against schema
# - Extract metadata from name/path
# - Migrate legacy fields to new structure
# - Generate URLs for public data
# - Update provenance timestamps

enrich - Metadata Enrichment

Extract and populate metadata across all entries:

registry_scrutinize.py enrich [--all | --incomplete] [--dry-run]

# Extracts:
# - Sample IDs (HG001-HG007, COLO829, NA12878, etc.)
# - Device types from serial numbers
# - Chemistry from flowcell/path patterns
# - Basecall models (sup/hac/fast)
# - Modifications (5mCG, 5hmCG, 6mA)
# - Kit information
# - Run dates

reanalyze - Re-analyze from Source

Re-analyze experiment from source data:

registry_scrutinize.py reanalyze <exp_id> [--max-reads 50000] [--no-stream]

# For public data:
# - Stream BAM header for metadata
# - Sample reads for QC metrics
# - Compute Q-scores via probability space
# - Calculate N50, length distribution

# For local data:
# - Parse POD5/BAM files
# - Extract run metadata
# - Compute full QC statistics

batch - Batch Operations

Process multiple experiments:

registry_scrutinize.py batch [options]

Options:
  --incomplete    Process only entries with <80% completeness
  --unanalyzed    Process only entries without analyses
  --public        Process only public data
  --local         Process only local data
  --fix           Apply fixes (default: dry-run)
  --reanalyze     Re-analyze from source
  --limit N       Process at most N entries

report - Health Report

Generate comprehensive HTML report:

registry_scrutinize.py report [--output report.html]

# Report includes:
# - Registry overview statistics
# - Completeness distribution
# - Field coverage analysis
# - Recent audit history
# - Recommendations

Validation Standards

Required Fields (Must Pass)

Field Description
id Unique experiment ID (exp-XXXXXXXX)
name Human-readable name
source Data source (local, ont-open-data)

Important Fields (Should Have)

Field Weight Description
metadata.sample 15 Sample identifier
metadata.device_type 10 Device model
metadata.chemistry 10 Flowcell chemistry
metadata.basecall_model 10 Model accuracy tier
metadata.flowcell_id 5 Flowcell ID

Metric Fields (For Analysis)

Field Weight Description
read_counts.sampled 10 Reads analyzed
quality_metrics.mean_qscore 10 Mean Q-score
length_metrics.n50 10 N50 value

Completeness Thresholds

  • Good (80%+): All critical metadata present
  • Warning (50-79%): Some metadata missing
  • Poor (<50%): Significant gaps, needs attention

Metadata Extraction Patterns

Sample Detection

  • GIAB: HG001-HG007, NA12878, NA24385, etc.
  • Cancer: COLO829, HCC1395, HCC1937
  • Reference: CHM13
  • Cell lines: Jurkat, HEK293T, HeLa

Device Detection

  • PromethION: PA*, PCA*, device IDs starting with MD-
  • MinION: MN*
  • GridION: GXB*
  • Flongle: FLO*

Chemistry Detection

  • R10.4.1: Latest chemistry
  • R10.4: Intermediate
  • R9.4.1: Legacy

Model Detection

  • sup: Super high accuracy
  • hac: High accuracy
  • fast: Fast basecalling

Audit Logging

All changes are logged to ~/.ont-registry/audit_log.yaml:

entries:
  - timestamp: "2025-12-29T10:00:00"
    action: "scrutinize_fix"
    experiment_id: "exp-abc123"
    changes:
      extracted_sample: "HG002"
      extracted_chemistry: "R10.4.1"
      migrated_quality_metrics: true
    user: "claude-code"

Integration with Registry

This skill modifies the registry at ~/.ont-registry/experiments.yaml:

# Check before making changes
/registry-scrutinize audit

# Apply fixes with dry-run first
/registry-scrutinize batch --incomplete --fix --dry-run

# Then apply for real
/registry-scrutinize batch --incomplete --fix

Examples

Fix all incomplete entries

/registry-scrutinize batch --incomplete --fix

Re-analyze public data with fresh QC

/registry-scrutinize batch --public --unanalyzed --reanalyze --limit 10

Generate report after changes

/registry-scrutinize report --output ~/registry_health.html