name	ont-pipeline
description	Oxford Nanopore multi-step workflow orchestration with unified QC aggregation.

ONT Pipeline - Workflow Orchestration

Multi-step analysis pipeline orchestrator with unified QC aggregation and pharmacogenomics support.

Core Concept

Defines and executes reproducible analysis workflows:

# ~/.ont-registry/pipelines/pharmaco-clinical.yaml
name: pharmaco-clinical
description: Clinical pharmacogenomics workflow
version: "1.0"

steps:
  - name: end_reasons
    analysis: end_reasons
    required: true
    pass_criteria:
      signal_positive_pct: ">=75"
    outputs: [json, plot]
    
  - name: basecalling
    analysis: basecalling
    depends_on: [end_reasons]
    parameters:
      model: sup
      modifications: 5mCG_5hmCG
    outputs: [bam, json]
    
  - name: alignment
    analysis: alignment
    depends_on: [basecalling]
    parameters:
      reference: GRCh38
      preset: map-ont
    outputs: [bam, stats]
    
  - name: variants
    analysis: variant_calling
    depends_on: [alignment]
    parameters:
      caller: clair3
      model: r1041_e82_400bps_sup_v500
    outputs: [vcf, json]
    
  - name: cyp2d6
    analysis: cyp2d6_calling
    depends_on: [variants]
    parameters:
      caller: cyrius
      reference: GRCh38
    outputs: [json, tsv]
    
  - name: pharmcat
    analysis: pharmcat
    depends_on: [variants, cyp2d6]
    parameters:
      reporter: true
      sources: [CPIC, DPWG, FDA]
    outputs: [json, html]

aggregation:
  metrics:
    - source: end_reasons
      fields: [quality_status, signal_positive_pct]
    - source: basecalling  
      fields: [mean_qscore, median_qscore, n50, total_reads]
    - source: alignment
      fields: [mapped_pct, mean_coverage, target_coverage]
    - source: variants
      fields: [total_variants, pass_variants, ti_tv_ratio]
    - source: cyp2d6
      fields: [diplotype, phenotype, activity_score]
    - source: pharmcat
      fields: [drug_count, actionable_count]

Quick Start

# Initialize registry with pipeline support
ont_experiments.py init --git

# List available pipelines
ont_pipeline.py list

# Run a pipeline on an experiment
ont_pipeline.py run pharmaco-clinical exp-abc123

# Run with custom parameters
ont_pipeline.py run pharmaco-clinical exp-abc123 \
  --param basecalling.model=hac \
  --param alignment.reference=/path/to/custom.fa

# Resume failed pipeline
ont_pipeline.py resume exp-abc123

# Generate unified report
ont_pipeline.py report exp-abc123 --format html --output report.html

Commands

Command	Description
`list`	List available pipelines
`show <pipeline>`	Show pipeline definition
`validate <pipeline>`	Validate pipeline YAML
`run <pipeline> <exp>`	Execute pipeline on experiment
`resume <exp>`	Resume from last successful step
`status <exp>`	Show pipeline execution status
`report <exp>`	Generate unified QC report
`batch <pipeline> <exp...>`	Run on multiple experiments
`create <name>`	Create new pipeline template

Built-in Pipelines

pharmaco-clinical

Full pharmacogenomics workflow with PharmCAT reporting:

end_reasons → basecalling(sup) → alignment → variants → cyp2d6 → pharmcat

qc-fast

Quick QC assessment:

end_reasons → basecalling(fast) → basic_stats

research-full

Complete research workflow with methylation:

end_reasons → basecalling(sup+5mC) → alignment → variants → sv_calling → methylation

validation

Validation against known truth set:

end_reasons → basecalling → alignment → variants → truth_comparison

Pipeline Execution

Dependency Resolution

Steps execute in dependency order with automatic parallelization:

end_reasons ─────────────────────────────┐
                                         ├─→ report
basecalling → alignment → variants → cyp2d6
                      └→ coverage_stats ─┘

State Tracking

Pipeline state stored in registry events:

events:
  - timestamp: "2025-01-15T10:00:00Z"
    type: pipeline_start
    pipeline: pharmaco-clinical
    version: "1.0"
    
  - timestamp: "2025-01-15T10:05:00Z"
    type: analysis
    analysis: end_reasons
    pipeline_step: 1
    exit_code: 0
    
  - timestamp: "2025-01-15T12:00:00Z"
    type: pipeline_complete
    pipeline: pharmaco-clinical
    duration_seconds: 7200
    steps_completed: 6
    steps_failed: 0

Failure Handling

# Pipeline fails at step 3
ont_pipeline.py run pharmaco-clinical exp-abc123
# Error: Step 'alignment' failed (exit code 1)
# Run 'ont_pipeline.py resume exp-abc123' to retry

# Resume from failed step
ont_pipeline.py resume exp-abc123
# Skipping completed: end_reasons, basecalling
# Retrying: alignment...

Unified QC Report

Aggregates metrics from all pipeline steps:

ont_pipeline.py report exp-abc123 --format html --output qc_report.html

Report Sections

Summary Dashboard
- Overall status (PASS/WARN/FAIL)
- Key metrics at a glance
- Pipeline execution timeline
Sequencing QC
- End reason distribution
- Quality scores
- Read length distribution
Basecalling
- Model and parameters
- Pass/fail rates
- Q-score distribution
Alignment
- Mapping statistics
- Coverage distribution
- Target region performance
Variant Calling
- Variant counts by type
- Quality metrics
- Ti/Tv ratio
Pharmacogenomics (if applicable)
- CYP2D6 diplotype and phenotype
- Drug-gene interactions
- Clinical recommendations

Report Formats

Format	Description
`html`	Interactive HTML dashboard
`pdf`	Print-ready PDF report
`json`	Machine-readable metrics
`markdown`	Documentation-friendly

Batch Processing

# Run pipeline on all experiments with a tag
ont_pipeline.py batch pharmaco-clinical \
  --tag cyp2d6 \
  --parallel 4 \
  --output-dir /results/batch_2025Q4

# Run on specific experiments
ont_pipeline.py batch pharmaco-clinical \
  exp-abc123 exp-def456 exp-ghi789 \
  --parallel 2

# Generate batch summary
ont_pipeline.py batch-report /results/batch_2025Q4

Custom Pipelines

Create from Template

ont_pipeline.py create my-workflow
# Creates ~/.ont-registry/pipelines/my-workflow.yaml

YAML Structure

name: my-workflow
description: Custom analysis workflow
version: "1.0"
author: your-name

# Parameters with defaults (overridable at runtime)
parameters:
  reference: GRCh38
  model_tier: sup

steps:
  - name: step_name
    analysis: analysis_type  # Maps to ANALYSIS_SKILLS in ont-experiments
    depends_on: []           # List of step names
    required: true           # Fail pipeline if step fails
    parameters:              # Step-specific parameters
      key: value
    pass_criteria:           # Conditions to continue
      metric: ">=threshold"
    outputs: [json, bam]     # Output types to generate
    
aggregation:
  metrics:
    - source: step_name
      fields: [metric1, metric2]
      
  thresholds:
    quality_status: PASS
    mean_qscore: ">=15"
    mapped_pct: ">=90"

HPC Integration

Pipelines automatically use HPC resources:

# Generate SLURM array job for batch
ont_pipeline.py batch pharmaco-clinical \
  --tag batch1 \
  --slurm batch_job.sbatch

# Submit
sbatch batch_job.sbatch

SLURM script adapts resources per step:

Basecalling: GPU partition (sigbio-a40)
Alignment: High-memory nodes
Variant calling: Multi-core CPU

Integration with ont-experiments

Pipeline events logged to experiment registry:

# View pipeline history
ont_experiments.py history exp-abc123

# Filter by pipeline
ont_experiments.py history exp-abc123 --filter pipeline=pharmaco-clinical

# Export pipeline commands
ont_experiments.py export exp-abc123 --pipeline

CLI Reference

ont_pipeline.py <command> [options]

Commands:
  list                    List available pipelines
  show <pipeline>         Show pipeline definition
  validate <pipeline>     Validate pipeline YAML
  run <pipeline> <exp>    Execute pipeline
  resume <exp>            Resume from last checkpoint
  status <exp>            Show execution status
  report <exp>            Generate unified report
  batch <pipeline> ...    Batch execution
  batch-report <dir>      Generate batch summary
  create <name>           Create pipeline template

Run options:
  --param KEY=VALUE       Override parameter
  --skip-step STEP        Skip specific step
  --from-step STEP        Start from step
  --dry-run               Show execution plan

Report options:
  --format FORMAT         Output format (html, pdf, json, markdown)
  --output FILE           Output file path
  --include-plots         Embed visualization plots

Batch options:
  --parallel N            Concurrent experiments
  --tag TAG               Filter by experiment tag
  --output-dir DIR        Results directory
  --slurm FILE            Generate SLURM array job

Dependencies

pyyaml>=6.0          # Pipeline definitions
jinja2>=3.0          # Report templating
pandas>=1.5          # Metrics aggregation
plotly>=5.0          # Interactive plots (optional)
weasyprint>=60       # PDF generation (optional)

ont-pipeline

Install Skill

SKILL.md