name	ont-metadata
description	Extract and manage metadata from Oxford Nanopore sequencing experiments including device info, chemistry, and run parameters.

ONT Metadata Parser

Name: ont-metadata
Author: Single-Molecule-Sequencing

Extract run-level metadata from Oxford Nanopore POD5 and Fast5 raw data files without requiring final_summary.txt files.

When to Use

Use this skill when you need to:

Extract metadata from experiments that only have raw data (no summary files)
Parse POD5 files for run information (flow_cell_id, sample_id, protocol, etc.)
Parse Fast5 files and detect file type (single-read, multi-read, bulk)
Discover experiment directories by scanning for raw data files
Get sequencing kit, basecall model, and protocol information from raw files

Quick Start

# Parse a single POD5 file
/ont-metadata /path/to/file.pod5

# Parse an experiment directory
/ont-metadata /path/to/experiment --verbose

# Find all experiment directories in a path
/ont-metadata /path/to/data --find-experiments

# Output metadata as JSON
/ont-metadata /path/to/experiment --json metadata.json

Metadata Extracted

From POD5 Files

Field	Description
`flow_cell_id`	Flow cell identifier (e.g., FBD19495)
`sample_id`	Sample name
`acquisition_id`	Unique acquisition identifier
`protocol`	Full protocol string
`instrument`	Device hostname
`started`	Acquisition start time
`sequencing_kit`	Kit identifier (e.g., sqk-lsk114)
`experiment_name`	Experiment name
`protocol_group_id`	Protocol group
`context_tags`	Dict with basecall_model, experiment_type, etc.
`tracking_id`	Dict with device_id, run_id, guppy_version, etc.

From Fast5 Files

Field	Description
`fast5_format`	File type: single-read, multi-read, or bulk
`read_count`	Number of reads in file
`flow_cell_id`	Flow cell identifier
`sample_id`	Sample name
`run_id`	Run identifier
`device_id`	Device serial number
`exp_start_time`	Experiment start time
`tracking_id`	Full tracking metadata dict
`context_tags`	Experiment context dict

Fast5 File Types

Type	Description
`single-read`	One read per file (legacy, deprecated)
`multi-read`	Multiple reads per file (4000 typical, current standard)
`bulk`	Raw channel data stream (special use case)

Options

Option	Description
`--format FORMAT`	Force format: pod5, fast5, or auto (default: auto)
`--json FILE`	Output metadata to JSON file
`--find-experiments`	Find all experiment directories in path
`--verbose, -v`	Show full metadata including tracking_id and context_tags

Example Output

Extracting metadata from: /data1/experiment/pod5/file.pod5
POD5 library: available
ont_fast5_api: available
h5py library: available

Extracted Metadata:
--------------------------------------------------
  flow_cell_id: FBD19495
  sample_id: sample_name
  acquisition_id: 905f220998358f97395fc01019bff9961aeafb0c
  protocol: sequencing/sequencing_MIN114_DNA_e8_2_400K:FLO-MIN114:SQK-LSK114:400
  instrument: rdlu0053
  started: 2025-07-14T04:54:43.011000+00:00
  sequencing_kit: sqk-lsk114
  experiment_name: WGS_LSK_human
  protocol_group_id: WGS_LSK_human
  pod5_count: 56

Integration with Discovery

This skill powers the experiment discovery in experiment-db skill:

# Discovery finds experiments with OR without summary files
python3 greatlakes_discovery.py scan-local --include-raw-only \
    --output manifest.json /path/to/data

# Manifest shows metadata source
{
  "metadata_source": "pod5_raw",  # or "final_summary" or "fast5_raw"
  "summary_file": null,
  "flow_cell_id": "FBD19495",
  ...
}

Dependencies

Required: Python >= 3.8
For POD5: pip install pod5
For Fast5: pip install ont-fast5-api (preferred) or pip install h5py (fallback)

Install all:

pip install pod5 ont-fast5-api

Library Priority

POD5 files: Uses pod5 library
Fast5 files: Uses ont_fast5_api (preferred) → falls back to h5py

ont-metadata

Install Skill

SKILL.md