name	bio-seq
description	Read/write FASTA, GenBank, FASTQ files. Sequence manipulation (complement, translate). Indexed random access via faidx. For NGS pipelines (SAM/BAM/VCF), use pysam. For BLAST, use gget or blat-integration.
user_invocable	true

Sequence I/O

Read, write, and manipulate biological sequence files (FASTA, GenBank, FASTQ).

When to Use This Skill

This skill should be used when:

Reading or writing sequence files (FASTA, GenBank, FASTQ)
Converting between sequence file formats
Manipulating sequences (complement, reverse complement, translate)
Extracting sequences from large indexed FASTA files (faidx)
Calculating sequence statistics (GC content, molecular weight, Tm)

When NOT to Use This Skill

NGS alignment files (SAM/BAM/VCF) → Use pysam
BLAST searches → Use gget (quick) or blat-integration (large-scale)
Multiple sequence alignment → Use msa-advanced
Phylogenetic analysis → Use etetoolkit
NCBI database queries → Use pubmed-database or gene-database

Tool Selection Guide

Task	Tool	Reference
Parse FASTA/GenBank/FASTQ	`Bio.SeqIO`	`biopython_seqio.md`
Convert file formats	`Bio.SeqIO.convert()`	`biopython_seqio.md`
Sequence operations	`Bio.Seq`	`biopython_seqio.md`
Large FASTA random access	`pysam.FastaFile` + faidx	`faidx.md`
GC%, Tm, molecular weight	`Bio.SeqUtils`	`utilities.md`

Quick Start

Installation

uv pip install biopython pysam

Read FASTA

from Bio import SeqIO

for record in SeqIO.parse("sequences.fasta", "fasta"):
    print(f"{record.id}: {len(record.seq)} bp")

Convert GenBank to FASTA

from Bio import SeqIO

SeqIO.convert("input.gb", "genbank", "output.fasta", "fasta")

Random Access with faidx

import pysam

# Create index (once)
pysam.faidx("reference.fasta")

# Random access
fasta = pysam.FastaFile("reference.fasta")
seq = fasta.fetch("chr1", 1000, 2000)  # 0-based coordinates
fasta.close()

Sequence Operations

from Bio.Seq import Seq

seq = Seq("ATGCGATCGATCG")
print(seq.complement())
print(seq.reverse_complement())
print(seq.translate())

Reference Documentation

Consult the appropriate reference file for detailed documentation:

`references/biopython_seqio.md`

Bio.Seq object and sequence operations
Bio.SeqIO for file parsing and writing
SeqRecord object and annotations
Supported file formats
Format conversion patterns

`references/faidx.md`

Creating FASTA index with pysam.faidx()
pysam.FastaFile for random access
Coordinate systems (0-based vs 1-based)
Performance considerations for large files
Common patterns (variant context, gene extraction)

`references/utilities.md`

GC content calculation (gc_fraction)
Molecular weight (molecular_weight)
Melting temperature (MeltingTemp)
Codon usage analysis
Restriction enzyme sites

`references/formats.md`

FASTA format specification
GenBank format specification
FASTQ format and quality scores
Format detection and validation

Coordinate Systems

Biopython: Uses Python-style 0-based, half-open intervals for slicing.

pysam.FastaFile.fetch():

Numeric arguments: 0-based (fetch("chr1", 999, 2000) = positions 999-1999)
Region strings: 1-based (fetch("chr1:1000-2000") = positions 1000-2000)

Common Pitfalls

Coordinate confusion: Remember which tool uses 0-based vs 1-based
Missing faidx index: Random access requires .fai file
Format mismatch: Verify file format matches the format string in SeqIO.parse()
Iterator exhaustion: SeqIO.parse() returns an iterator; convert to list if multiple passes needed
Large files: Use iterators, not list(), for memory efficiency

bio-seq

Install Skill

SKILL.md