| name | chromatin-state-inference |
| description | This skill should be used when users need to infer chromatin states from histone modification ChIP-seq data using chromHMM. It provides workflows for chromatin state segmentation, model training, state annotation. |
ChromHMM Chromatin State Inference
Overview
This skill enables comprehensive chromatin state analysis using chromHMM for histone modification ChIP-seq data. ChromHMM uses a multivariate Hidden Markov Model to segment the genome into discrete chromatin states based on combinatorial patterns of histone modifications.
Main steps include:
- Refer to Inputs & Outputs to verify necessary files.
- Always prompt user if required files are missing.
- Always prompt user for genome assembly used.
- Always prompt user for the bin size for generating binarized files.
- Always prompt user for the bin size for the number of states the ChromHMM target.
- Run chromHMM workflow: Binarization → Learning.
When to use this skill
Use this skill when you need to infer chromatin states from histone modification ChIP-seq data using chromHMM.
Inputs & Outputs
Inputs
(1) Option 1: BED files of aligned reads
<mark1>.bed
<mark2>.bed
... # Other marks
(1) Option 2: BAM files of aligned reads
<mark1>.bam
<mark2>.bam
... # Other marks
Outputs
chromhmm_output/
binarized/
*.txt
model/
*.txt
... # other files output by the ChromHMM
Decision Tree
Step 0: Initialize Project
Call:
mcp__project-init-tools__project_init
with:
sample: alltask: chromhmm
Step 1: Prepare the cellmarkfile (skip this step if signal files are provided)
Prepare a .txt file (without header) containing following three columns:
- sample name
- marker name
- name of the BED/BAM file
- control file of the sample (only provided if the input/control file is available)
example of the cellmark.txt file
cell1 mark1 cell1_mark2.bam cell1_control.bam
cell1 mark2 cell1_mark2.bam cell1/control.bam
Step 2: Data Binarization
For BAM inputs:
Call:mcp__chromhmm-tools__binarize_bamwith:path_chrom_sized: Provide by user or detect from the working directoryinput_dir: Directory containing BAM filescellmarkfile: Cell mark file defining histone modificationsoutput_dir: (e.g.binarized/)bin_size: Provided by user
For BED inputs:
Callmcp__chromhmm-tools__binarize_bedinstead.For Signal inputs:
Call:mcp__chromhmm-tools__binarize_signalwith:input_dir: Directory of signalsoutput_dir: (e.g.binarized/)
Step 3: Model Learning
Call
mcp__chromhmm-tools__learn_model
with:
binarized_dir: Directory binarized file located innum_states: Provide by user (e.g. 15)output_model_dir: (e.g.model_15_states/)genome: Provide by user (e.g.hg38)threads: Provide by user (e.g. 16)
Parameter Optimization
Number of States
- 8 states: Basic chromatin states
- 15 states: Standard comprehensive states
- 25 states: High-resolution states
- Optimization: Use Bayesian Information Criterion (BIC)
Bin Size
- 200bp: Standard resolution
- 100bp: High resolution (requires more memory)
- 500bp: Low resolution (faster computation)
State Interpretation
Common Chromatin States
- Active Promoter: H3K4me3, H3K27ac
- Weak Promoter: H3K4me3
- Poised Promoter: H3K4me3, H3K27me3
- Strong Enhancer: H3K27ac, H3K4me1
- Weak Enhancer: H3K4me1
- Insulator: CTCF
- Transcribed: H3K36me3
- Repressed: H3K27me3
- Heterochromatin: Low signal across marks
Troubleshooting
- Memory errors: Reduce bin size or number of states
- Convergence problems: Increase iterations or adjust learning rate
- Uninterpretable states: Check input data quality and mark combinations
- Missing chromosomes: Verify chromosome naming consistency