name	pyopenms
description	Mass spectrometry toolkit (OpenMS Python). Process mzML/mzXML, peak picking, feature detection, peptide ID, proteomics/metabolomics workflows, for LC-MS/MS analysis.

pyOpenMS

Overview

pyOpenMS is an open-source Python library for mass spectrometry data analysis in proteomics and metabolomics. Process LC-MS/MS data, perform peptide identification, detect and quantify features, and integrate with common proteomics tools (Comet, Mascot, MSGF+, Percolator, MSstats) using Python bindings to the OpenMS C++ library.

When to Use This Skill

This skill should be used when:

Processing mass spectrometry data (mzML, mzXML files)
Performing peak picking and feature detection in LC-MS data
Conducting peptide and protein identification workflows
Quantifying metabolites or proteins
Integrating proteomics or metabolomics tools into Python pipelines
Working with OpenMS tools and file formats

Core Capabilities

1. File I/O and Data Import/Export

Handle diverse mass spectrometry file formats efficiently:

Supported Formats:

mzML/mzXML: Primary raw MS data formats (profile or centroid)
FASTA: Protein/peptide sequence databases
mzTab: Standardized reporting format for identification and quantification
mzIdentML: Peptide and protein identification data
TraML: Transition lists for targeted experiments
pepXML/protXML: Search engine results

Reading mzML Files:

import pyopenms as oms

# Load MS data
exp = oms.MSExperiment()
oms.MzMLFile().load("input_data.mzML", exp)

# Access basic information
print(f"Number of spectra: {exp.getNrSpectra()}")
print(f"Number of chromatograms: {exp.getNrChromatograms()}")

Writing mzML Files:

# Save processed data
oms.MzMLFile().store("output_data.mzML", exp)

File Encoding: pyOpenMS automatically handles Base64 encoding, zlib compression, and Numpress compression internally.

2. MS Data Structures and Manipulation

Work with core mass spectrometry data structures. See references/data_structures.md for comprehensive details.

MSSpectrum - Individual mass spectrum:

# Create spectrum with metadata
spectrum = oms.MSSpectrum()
spectrum.setRT(205.2)  # Retention time in seconds
spectrum.setMSLevel(2)  # MS2 spectrum

# Set peak data (m/z, intensity arrays)
mz_array = [100.5, 200.3, 300.7, 400.2]
intensity_array = [1000, 5000, 3000, 2000]
spectrum.set_peaks((mz_array, intensity_array))

# Add precursor information for MS2
precursor = oms.Precursor()
precursor.setMZ(450.5)
precursor.setCharge(2)
spectrum.setPrecursors([precursor])

MSExperiment - Complete LC-MS/MS run:

# Create experiment and add spectra
exp = oms.MSExperiment()
exp.addSpectrum(spectrum)

# Access spectra
first_spectrum = exp.getSpectrum(0)
for spec in exp:
    print(f"RT: {spec.getRT()}, MS Level: {spec.getMSLevel()}")

MSChromatogram - Extracted ion chromatogram:

# Create chromatogram
chrom = oms.MSChromatogram()
chrom.set_peaks(([10.5, 11.2, 11.8], [1000, 5000, 3000]))  # RT, intensity
exp.addChromatogram(chrom)

Efficient Peak Access:

# Get peaks as numpy arrays for fast processing
mz_array, intensity_array = spectrum.get_peaks()

# Modify and set back
intensity_array *= 2  # Double all intensities
spectrum.set_peaks((mz_array, intensity_array))

3. Chemistry and Peptide Handling

Perform chemical calculations for proteomics and metabolomics. See references/chemistry.md for detailed examples.

Molecular Formulas and Mass Calculations:

# Create empirical formula
formula = oms.EmpiricalFormula("C6H12O6")  # Glucose
print(f"Monoisotopic mass: {formula.getMonoWeight()}")
print(f"Average mass: {formula.getAverageWeight()}")

# Formula arithmetic
water = oms.EmpiricalFormula("H2O")
dehydrated = formula - water

# Isotope-specific formulas
heavy_carbon = oms.EmpiricalFormula("(13)C6H12O6")

Isotopic Distributions:

# Generate coarse isotope pattern (unit mass resolution)
coarse_gen = oms.CoarseIsotopePatternGenerator()
pattern = coarse_gen.run(formula)

# Generate fine structure (high resolution)
fine_gen = oms.FineIsotopePatternGenerator(0.01)  # 0.01 Da resolution
fine_pattern = fine_gen.run(formula)

Amino Acids and Residues:

# Access residue information
res_db = oms.ResidueDB()
leucine = res_db.getResidue("Leucine")
print(f"L monoisotopic mass: {leucine.getMonoWeight()}")
print(f"L formula: {leucine.getFormula()}")
print(f"L pKa: {leucine.getPka()}")

Peptide Sequences:

# Create peptide sequence
peptide = oms.AASequence.fromString("PEPTIDE")
print(f"Peptide mass: {peptide.getMonoWeight()}")
print(f"Formula: {peptide.getFormula()}")

# Add modifications
modified = oms.AASequence.fromString("PEPTIDEM(Oxidation)")
print(f"Modified mass: {modified.getMonoWeight()}")

# Theoretical fragmentation
ions = []
for i in range(1, peptide.size()):
    b_ion = peptide.getPrefix(i)
    y_ion = peptide.getSuffix(i)
    ions.append(('b', i, b_ion.getMonoWeight()))
    ions.append(('y', i, y_ion.getMonoWeight()))

Protein Digestion:

# Enzymatic digestion
dig = oms.ProteaseDigestion()
dig.setEnzyme("Trypsin")
dig.setMissedCleavages(2)

protein_seq = oms.AASequence.fromString("MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEK")
peptides = []
dig.digest(protein_seq, peptides)

for pep in peptides:
    print(f"{pep.toString()}: {pep.getMonoWeight():.2f} Da")

Modifications:

# Access modification database
mod_db = oms.ModificationsDB()
oxidation = mod_db.getModification("Oxidation")
print(f"Oxidation mass diff: {oxidation.getDiffMonoMass()}")
print(f"Residues: {oxidation.getResidues()}")

4. Signal Processing and Filtering

Apply algorithms to process and filter MS data. See references/algorithms.md for comprehensive coverage.

Spectral Smoothing:

# Gaussian smoothing
gauss_filter = oms.GaussFilter()
params = gauss_filter.getParameters()
params.setValue("gaussian_width", 0.2)
gauss_filter.setParameters(params)
gauss_filter.filterExperiment(exp)

# Savitzky-Golay filter
sg_filter = oms.SavitzkyGolayFilter()
sg_filter.filterExperiment(exp)

Peak Filtering:

# Keep only N largest peaks per spectrum
n_largest = oms.NLargest()
params = n_largest.getParameters()
params.setValue("n", 100)  # Keep top 100 peaks
n_largest.setParameters(params)
n_largest.filterExperiment(exp)

# Threshold filtering
threshold_filter = oms.ThresholdMower()
params = threshold_filter.getParameters()
params.setValue("threshold", 1000.0)  # Remove peaks below 1000 intensity
threshold_filter.setParameters(params)
threshold_filter.filterExperiment(exp)

# Window-based filtering
window_filter = oms.WindowMower()
params = window_filter.getParameters()
params.setValue("windowsize", 50.0)  # 50 m/z windows
params.setValue("peakcount", 10)     # Keep 10 highest per window
window_filter.setParameters(params)
window_filter.filterExperiment(exp)

Spectrum Normalization:

normalizer = oms.Normalizer()
normalizer.filterExperiment(exp)

MS Level Filtering:

# Keep only MS2 spectra
exp.filterMSLevel(2)

# Filter by retention time range
exp.filterRT(100.0, 500.0)  # Keep RT between 100-500 seconds

# Filter by m/z range
exp.filterMZ(400.0, 1500.0)  # Keep m/z between 400-1500

5. Feature Detection and Quantification

Detect and quantify features in LC-MS data:

Peak Picking (Centroiding):

# Convert profile data to centroid
picker = oms.PeakPickerHiRes()
params = picker.getParameters()
params.setValue("signal_to_noise", 1.0)
picker.setParameters(params)

exp_centroided = oms.MSExperiment()
picker.pickExperiment(exp, exp_centroided)

Feature Detection:

# Detect features across LC-MS runs
feature_finder = oms.FeatureFinderMultiplex()

features = oms.FeatureMap()
feature_finder.run(exp, features, params)

print(f"Found {features.size()} features")
for feature in features:
    print(f"m/z: {feature.getMZ():.4f}, RT: {feature.getRT():.2f}, "
          f"Intensity: {feature.getIntensity():.0f}")

Feature Linking (Map Alignment):

# Link features across multiple samples
feature_grouper = oms.FeatureGroupingAlgorithmQT()
consensus_map = oms.ConsensusMap()

# Provide multiple feature maps from different samples
feature_maps = [features1, features2, features3]
feature_grouper.group(feature_maps, consensus_map)

6. Peptide Identification Workflows

Integrate with search engines and process identification results:

Database Searching:

# Prepare parameters for search engine
params = oms.Param()
params.setValue("database", "uniprot_human.fasta")
params.setValue("precursor_mass_tolerance", 10.0)  # ppm
params.setValue("fragment_mass_tolerance", 0.5)     # Da
params.setValue("enzyme", "Trypsin")
params.setValue("missed_cleavages", 2)

# Variable modifications
params.setValue("variable_modifications", ["Oxidation (M)", "Phospho (STY)"])

# Fixed modifications
params.setValue("fixed_modifications", ["Carbamidomethyl (C)"])

FDR Control:

# False discovery rate estimation
fdr = oms.FalseDiscoveryRate()
fdr_threshold = 0.01  # 1% FDR

# Apply to peptide identifications
protein_ids = []
peptide_ids = []
oms.IdXMLFile().load("search_results.idXML", protein_ids, peptide_ids)

fdr.apply(protein_ids, peptide_ids)

7. Metabolomics Workflows

Analyze small molecule data:

Adduct Detection:

# Common metabolite adducts
adducts = ["[M+H]+", "[M+Na]+", "[M+K]+", "[M-H]-", "[M+Cl]-"]

# Feature annotation with adducts
for feature in features:
    mz = feature.getMZ()
    # Calculate neutral mass for each adduct hypothesis
    for adduct in adducts:
        # Annotation logic
        pass

Isotope Pattern Matching:

# Compare experimental to theoretical isotope patterns
experimental_pattern = []  # Extract from feature
theoretical = coarse_gen.run(formula)

# Calculate similarity score
similarity = compare_isotope_patterns(experimental_pattern, theoretical)

8. Quality Control and Visualization

Monitor data quality and visualize results:

Basic Statistics:

# Calculate TIC (Total Ion Current)
tic_values = []
rt_values = []
for spectrum in exp:
    if spectrum.getMSLevel() == 1:
        tic = sum(spectrum.get_peaks()[1])  # Sum intensities
        tic_values.append(tic)
        rt_values.append(spectrum.getRT())

# Base peak chromatogram
bpc_values = []
for spectrum in exp:
    if spectrum.getMSLevel() == 1:
        max_intensity = max(spectrum.get_peaks()[1]) if spectrum.size() > 0 else 0
        bpc_values.append(max_intensity)

Plotting (with pyopenms.plotting or matplotlib):

import matplotlib.pyplot as plt

# Plot TIC
plt.figure(figsize=(10, 4))
plt.plot(rt_values, tic_values)
plt.xlabel('Retention Time (s)')
plt.ylabel('Total Ion Current')
plt.title('TIC')
plt.show()

# Plot single spectrum
spectrum = exp.getSpectrum(0)
mz, intensity = spectrum.get_peaks()
plt.stem(mz, intensity, basefmt=' ')
plt.xlabel('m/z')
plt.ylabel('Intensity')
plt.title(f'Spectrum at RT {spectrum.getRT():.2f}s')
plt.show()

Common Workflows

Complete LC-MS/MS Processing Pipeline

import pyopenms as oms

# 1. Load data
exp = oms.MSExperiment()
oms.MzMLFile().load("raw_data.mzML", exp)

# 2. Filter and smooth
exp.filterMSLevel(1)  # Keep only MS1 for feature detection
gauss = oms.GaussFilter()
gauss.filterExperiment(exp)

# 3. Peak picking
picker = oms.PeakPickerHiRes()
exp_centroid = oms.MSExperiment()
picker.pickExperiment(exp, exp_centroid)

# 4. Feature detection
ff = oms.FeatureFinderMultiplex()
features = oms.FeatureMap()
ff.run(exp_centroid, features, oms.Param())

# 5. Export results
oms.FeatureXMLFile().store("features.featureXML", features)
print(f"Detected {features.size()} features")

Theoretical Peptide Mass Calculation

# Calculate masses for peptide with modifications
peptide = oms.AASequence.fromString("PEPTIDEK")
print(f"Unmodified [M+H]+: {peptide.getMonoWeight() + 1.007276:.4f}")

# With modification
modified = oms.AASequence.fromString("PEPTIDEM(Oxidation)K")
print(f"Oxidized [M+H]+: {modified.getMonoWeight() + 1.007276:.4f}")

# Calculate for different charge states
for z in [1, 2, 3]:
    mz = (peptide.getMonoWeight() + z * 1.007276) / z
    print(f"[M+{z}H]^{z}+: {mz:.4f}")

Installation

Ensure pyOpenMS is installed before using this skill:

# Via conda (recommended)
conda install -c bioconda pyopenms

# Via pip
pip install pyopenms

Integration with Other Tools

pyOpenMS integrates seamlessly with:

Search Engines: Comet, Mascot, MSGF+, MSFragger, Sage, SpectraST
Post-processing: Percolator, MSstats, Epiphany
Metabolomics: SIRIUS, CSI:FingerID
Data Analysis: Pandas, NumPy, SciPy for downstream analysis
Visualization: Matplotlib, Seaborn for plotting

Resources

references/

Detailed documentation on core concepts:

data_structures.md - Comprehensive guide to MSExperiment, MSSpectrum, MSChromatogram, and peak data handling
algorithms.md - Complete reference for signal processing, filtering, feature detection, and quantification algorithms
chemistry.md - In-depth coverage of chemistry calculations, peptide handling, modifications, and isotope distributions

Load these references when needing detailed information about specific pyOpenMS capabilities.

Best Practices

File Format: Always use mzML for raw MS data (standardized, well-supported)
Peak Access: Use get_peaks() and set_peaks() with numpy arrays for efficient processing
Parameters: Always check and configure algorithm parameters via getParameters() and setParameters()
Memory: For large datasets, process spectra iteratively rather than loading entire experiments
Validation: Check data integrity (MS levels, RT ordering, precursor information) after loading
Modifications: Use standard modification names from UniMod database
Units: RT in seconds, m/z in Thomson (Da/charge), intensity in arbitrary units

Common Patterns

Algorithm Application Pattern:

# 1. Instantiate algorithm
algorithm = oms.SomeAlgorithm()

# 2. Get and configure parameters
params = algorithm.getParameters()
params.setValue("parameter_name", value)
algorithm.setParameters(params)

# 3. Apply to data
algorithm.filterExperiment(exp)  # or .process(), .run(), depending on algorithm

File I/O Pattern:

# Read
data_container = oms.DataContainer()  # MSExperiment, FeatureMap, etc.
oms.FileHandler().load("input.format", data_container)

# Process
# ... manipulate data_container ...

# Write
oms.FileHandler().store("output.format", data_container)

Getting Help

Documentation: https://pyopenms.readthedocs.io/
API Reference: Browse class documentation for detailed method signatures
OpenMS Website: https://www.openms.org/
GitHub Issues: https://github.com/OpenMS/OpenMS/issues

pyopenms

Install Skill

SKILL.md