| name | metagenomics |
| description | Analyze metagenomic data from environmental samples to characterize microbial communities. |
Metagenomics
Analyze genetic material from environmental samples to study microbial community composition and function.
Quality Control and Preprocessing
# Quality filtering with fastp
fastp -i reads_R1.fastq.gz -I reads_R2.fastq.gz \
-o clean_R1.fastq.gz -O clean_R2.fastq.gz \
--detect_adapter_for_pe \
--cut_front --cut_tail \
--html fastp_report.html
# Remove host contamination with Bowtie2
bowtie2 -x human_genome -1 clean_R1.fastq.gz -2 clean_R2.fastq.gz \
--un-conc-gz filtered_%.fastq.gz -S /dev/null
Taxonomic Profiling with Kraken2
# Build/download database
kraken2-build --download-library bacteria --db kraken_db
kraken2-build --build --db kraken_db
# Classify reads
kraken2 --db kraken_db \
--paired filtered_1.fastq.gz filtered_2.fastq.gz \
--output kraken_output.txt \
--report kraken_report.txt \
--threads 8
# Estimate abundances with Bracken
bracken -d kraken_db -i kraken_report.txt \
-o bracken_output.txt -w bracken_report.txt \
-r 150 -l S
Assembly with MetaSPAdes
# Metagenomic assembly
metaspades.py -1 filtered_1.fastq.gz -2 filtered_2.fastq.gz \
-o assembly_output/ -t 16 -m 100
# Check assembly quality
metaquast.py assembly_output/contigs.fasta \
-o quast_output/
Binning with MetaBAT2
# Map reads to assembly
bowtie2-build contigs.fasta contigs_index
bowtie2 -x contigs_index -1 filtered_1.fastq.gz -2 filtered_2.fastq.gz \
| samtools sort -o mapped.bam
# Generate depth file
jgi_summarize_bam_contig_depths --outputDepth depth.txt mapped.bam
# Run MetaBAT2
metabat2 -i contigs.fasta -a depth.txt -o bins/bin
# Evaluate bins with CheckM
checkm lineage_wf bins/ checkm_output/ -x fa -t 8
Functional Analysis with HUMAnN3
# Run HUMAnN3
humann --input metagenome.fastq \
--output humann_output/ \
--threads 8
# Gene families and pathways
humann_renorm_table --input genefamilies.tsv \
--output genefamilies_relab.tsv \
--units relab
# Pathway abundance
humann_barplot --input pathabundance.tsv \
--output pathway_barplot.png \
--sort sum
Python Analysis
import pandas as pd
import matplotlib.pyplot as plt
# Load Kraken report
def parse_kraken_report(report_file):
"""Parse Kraken2 report format."""
columns = ['percent', 'reads_clade', 'reads_taxon',
'rank', 'taxid', 'name']
df = pd.read_csv(report_file, sep='\t', names=columns)
df['name'] = df['name'].str.strip()
return df
report = parse_kraken_report('kraken_report.txt')
# Filter to species level
species = report[report['rank'] == 'S']
species = species.nlargest(20, 'percent')
# Visualization
plt.figure(figsize=(12, 6))
plt.barh(species['name'], species['percent'])
plt.xlabel('Relative Abundance (%)')
plt.title('Top 20 Species')
plt.tight_layout()
plt.savefig('species_abundance.png')
Diversity Analysis
import numpy as np
from scipy import stats
def calculate_diversity(abundance_vector):
"""Calculate alpha diversity metrics."""
# Normalize to proportions
props = abundance_vector / abundance_vector.sum()
props = props[props > 0]
# Shannon diversity
shannon = -np.sum(props * np.log(props))
# Simpson diversity
simpson = 1 - np.sum(props ** 2)
# Richness
richness = len(props)
return {
'shannon': shannon,
'simpson': simpson,
'richness': richness,
'evenness': shannon / np.log(richness) if richness > 1 else 0
}
# Compare samples with Bray-Curtis dissimilarity
from scipy.spatial.distance import braycurtis
def beta_diversity(sample1, sample2):
"""Calculate Bray-Curtis dissimilarity."""
return braycurtis(sample1, sample2)
Common Tools
- Kraken2/Bracken: Taxonomic classification
- MetaPhlAn: Marker-based profiling
- MetaSPAdes: Metagenomic assembly
- MetaBAT2/CONCOCT: Binning
- CheckM: Bin quality assessment
- HUMAnN3: Functional profiling
- Qiime2: 16S analysis pipeline