| name | functional-enrichment |
| description | Perform GO and KEGG functional enrichment using HOMER from genomic regions (BED/narrowPeak/broadPeak) or gene lists, and produce R-based barplot/dotplot visualizations. Use this skill when you want to perform GO and KEGG functional enrichment using HOMER from genomic regions or just want to link genomic region to genes. |
Functional Enrichment (HOMER + R)
Overview
- Validate input: Accept BED/peak files with genomic coordinates or gene lists; check format and genome assembly.
- Map regions to genes: Convert regions to a unique gene set using HOMER
annotatePeaks.pl. - Run GO enrichment: Use HOMER
findGO.pl(orannotatePeaks.pl -go) for BP/MF/CC. - Run KEGG enrichment: Use HOMER
findGO.pl -kegg(orannotatePeaks.pl -kegg). - Collect outputs: Save tidy tables for downstream plotting and a compact summary of top terms.
- Visualize in R: Create barplots and dotplots (GO/KEGG) with
ggplot2from standardized outputs. - QC & troubleshooting: Provide checks for genome mismatch, chromosome naming, and low-signal inputs.
Inputs & Outputs
Inputs (choose one):
Option 1: Input is a genomic region file (BED/narrowPeak/broadPeak)
Genomic region formats supported:
- BED files: Standard genomic interval format
- narrowPeak: narrow peak format
- broadPeak: broad peak format
Option 2: Input is a gene list (txt)
gene_list.txtwith one official gene symbol per line (no header). And an optionalgene_list_background.txtwith one official gene symbol per line (no header).
Outputs (directory layout):
${sample}_functional_enrichment/
results/
${sample}.anno_genomic_features.txt
${sample}.anno_genomic_features_stats.txt
biological_process.txt
cellular_component.txt
molecular_function.txt
kegg.txt
biocyc.txt
chromosome.txt
cosmic.txt
interactions.txt
interpro.txt
gene3d.txt
pathwayInteractionDB.txt
pfam.txt
prints.txt
prosite.txt
reactome.txt
smpdb.txt
wikipathways.txt
gwas.txt
lipidmaps.txt
msigdb.txt
smart.txt
tables/
${sample}.gene_list.txt
go_bp.tsv
go_mf.tsv
go_cc.tsv
kegg.tsv
plots/
${sample}.GO_barplot.pdf
GO_barplot.pdf
GO_dotplot.pdf
KEGG_barplot.pdf
KEGG_dotplot.pdf
logs/
${sample}.anno_genomic_features.log # if genome region file is provided
findGO.log
Decision Tree
Step 0 — Gather Required Information from the User
Before calling any tool, ask the user:
- Sample name (
sample): used as prefix and for the output directory${sample}_functional_enrichment. - Genome assembly (
genome): e.g.hg38,mm10,danRer11.- Never guess or auto-detect.
Step 1: Initialize Project
- Make director for this project:
Call:
mcp__project-init-tools__project_init
with:
sample: the user-provided sample nametask: de_novo_motif_discovery
The tool will:
- Create
${sample}_functional_enrichmentdirectory. - Get the full path of the
${sample}_functional_enrichmentdirectory, which will be used as${proj_dir}.
Step 2: Prepare genome file for homer
Call:
mcp__homer-tools__check_genome_installation
With:
genome: the user-provided genome assembly, e.g.hg38,mm10,danRer11
The tool will:
- Check if the genome is installed in HOMER.
- If not, install the genome.
Step 3 (Optional): Standardize chromosome names for BED files
This step is optional. Only perform this step if the input file is a BED file. If the input file is a gene list, skip this step.
From 1 format to chr1 format
From MT format to chrM format
Call:
mcp__file-format-tools__standardize_bed_chrom_names
with:
input_bed: the user-provided BED fileoutput_bed: the path to save the standardized BED file
The tool will:
- Standardize the chromosome names in the BED file.
- Return the path of the standardized BED file.
Step 4 (Optional): Convert gene ID to gene symbol
This step is optional. Only perform this step if the input file is a gene list file. If the input file is a BED file, skip this step.
Call:
mcp__mygene-tools__convert_gene_ids_mygene
With:
input_ids_file: the user-provided gene list file. May end with.txt.scopes: the source ID type for mygene (e.g., 'ensembl.gene', 'symbol', 'entrezgene', 'uniprot', or a comma-separated list).fields: the comma-separated target fields to retrieve from mygene (e.g., 'symbol,ensembl.gene,uniprot,entrezgene').species: the species for mygene (e.g., 'human', 'mouse', 'zebrafish', or NCBI taxon ID like '9606').out_file: the path to save the converted gene list file. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_initbatch_size: the batch size for mygene.querymany (default 1000).
The tool will:
- Convert the gene ID to gene symbol.
- Return the path of the converted gene list file.
Step 5: GO enrichment analysis
Option 1: from genomic regions file
Only if the input file is a BED file. If the input file is a gene list, call tools in Option 2.
- annotate the genomic regions using Homer's
annotatePeaks.plwith-gooption. If user also provides a background genome region file, like a control peak file, also call this tool for the background genome region file. Use a different${sample}as the sample name for the background sample.
Call:
mcp__homer-tools__annotate_genomic_features
With:
sample: the user-provided sample nameproj_dir: directory to save the genomic feature annotation results. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_initregions_bed: the user-provided regions file in BED format. May end with.bed,.narrowPeak,.broadPeak, etc.genome: the user-provided genome assembly, e.g.hg38,mm10,danRer11ann: "custom homer annotation file (created by assignGenomeAnnotation.pl), (default: None).size_given: keep original region sizes (default: True)cpg: include CpG information (default: False)go:Trueto perform GO enrichment analysis.
The tool will:
- Annotate the genomic regions using Homer's
annotatePeaks.pl. - Return the path of the annotated regions file under
${proj_dir}/results/directory, and the path to the log file under${proj_dir}/logs/directory.${proj_dir}/results/${sample}.anno_genomic_features.txt${proj_dir}/results/${sample}.anno_genomic_features_stats.txt${proj_dir}/logs/${sample}.anno_genomic_features.log
- (optional) extract the genes from the annotated regions file if neccessary for future analysis or the target gene list is requested by user. If not requested, skip this step.
Call:
mcp__file-format-tools__extract_gene_list
With:
sample: the user-provided sample nameproj_dir: directory to save the genomic feature annotation results. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_init
The tool will:
- Extract the genes from the annotated regions file.
- Return the path of the gene list file under
${proj_dir}/tables/directory.${proj_dir}/tables/${sample}.gene_list.txt
Option 2: from gene list file
Only if the input file is a gene list file. If the input file is a BED file, call tools in Option 1.
Call:
mcp__homer-tools__gene_function_enrichment
With:
sample: the user-provided sample nameproj_dir: directory to save the GO & KEGG enrichment results. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_initgene_list_file: the user-provided gene list file. May end with.txt.organism: the user-provided organism name, e.g.human,mouse,zebrafish, etc.background_gene_list_file: the user-provided background gene list file. May end with.txt. If not provided, set this parameter toNone.
The tool will:
- Find the GO enrichment for the gene list.
- Return the path of the GO & KEGG enrichment results under
${proj_dir}/results/directory.${proj_dir}/results/biological_process.txt${proj_dir}/results/kegg.txt- ... other GO and KEGG enrichment results files.
- Return the path of the log file under
${proj_dir}/logs/directory.${proj_dir}/logs/${sample}.find_go_and_kegg_enrichment.log
Step 6: Visualization in Python (barplot & dotplot)
Call:
mcp__homer-tools__visualize_gene_function_enrichment
With:
sample: the user-provided sample nameproj_dir: directory to save the GO & KEGG enrichment results. In this skill, it is the full path of the${sample}_functional_enrichmentdirectory returned bymcp__project-init-tools__project_initgo_results_file: the user-provided GO results file. May be${proj_dir}/results/biological_process.txtor${proj_dir}/results/molecular_function.txtor${proj_dir}/results/cellular_component.txt, etc. This could be the GO results file under the${proj_dir}/results/directory returned bymcp__homer-tools__gene_function_enrichment.
The tool will:
- Visualize the GO & KEGG enrichment results in barplot and dotplot.
- Return the path of the barplot file under
${proj_dir}/plots/directory.${proj_dir}/plots/${sample}.GO_barplot.pdf
Alternative direct from BED
annotatePeaks.pl peaks.bed hg38 -go results/{run}/tables/go_dir -genomeOntologyannotatePeaks.pl peaks.bed hg38 -kegg results/{run}/tables/kegg_dir
Notes & Best Practices
- Genome & naming: Ensure the HOMER genome key matches the species; chromosome naming must be consistent (
chr1vs1). - BED format: Tab-delimited, ≥3 columns, 0-based coordinates, no header.
- Multiple testing: Prefer FDR (BH) if provided; otherwise fallback to P-value.
- Background set:
-bghelps reduce bias; choose a reasonable universe (e.g., all expressed or all accessible regions → genes). - Direct-from-BED:
annotatePeaks.pl -go/-keggis convenient; the gene-list route yields uniform TSVs for plotting.
Troubleshooting
- Many NAs after annotation: Check genome version, chromosome naming, BED formatting, and headers.
- Empty/weak enrichment: Ensure sufficient genes (suggest ≥50), verify species of symbols, tune thresholds or background.
- Column name drift: HOMER versions may differ; adjust R column mappings if needed.