| name | TF-differential-binding |
| description | The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations. |
DiffBind TF Differential Binding Analysis
Overview
This skill enables comprehensive differential TF binding analysis using DiffBind in R. DiffBind integrates read counting, normalization, and statistical modeling to identify differentially bound peaks between conditions.
To perform DiffBind differential binding analysis:
- Initialize the project directory.
- Refer to the Inputs & Outputs section to check inputs and build the output architecture. All the output file should located in
${proj_dir}in Step 0. - Always prompt user if required files are missing.
- Provide a sample sheet with ChIP-seq peak files and corresponding BAM files for each sample.
- Construct a
DBAobject from the sample sheet. - Compute read counts over consensus peak regions.
- Specify experimental conditions (e.g., treatment vs. control or cell_type_A vs. cell_type_B).
- Run statistical tests to identify differentially bound regions.
- Generate correlation heatmaps, PCA plots, and volcano plots; extract significant binding events.
When to use this skill
Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.
Recommended applications include:
- Comparing treated vs. control or wild-type vs. mutant conditions to identify TF binding changes in response to stimuli, drugs, or mutations.
- Comparing TF binding profiles between two cell types or experimental conditions to identify differentially bound regions (DBRs).
- Comparing the different TF function in two conditions.
- Integrating with RNA-seq to correlate TF binding alterations with gene expression changes.
- Investigating co-factor dependencies or chromatin remodeling events linked to TF occupancy.
Inputs & Outputs
Inputs (choose one)
- If starting from BAM files and BED peak files → Generate consensus peaks and count matrix.
- If starting from existing count matrix → Go directly to DiffBind analysis.
- If multiple conditions or batches → Include batch/condition in design
Outputs
${sample}_TF_DB_analysis/
DBs/
DB_results.csv # DESeq2 results (log2FC, p-values)
DB_up.bed
DB_down.bed
plots/ # visualization outputs
PCA.pdf
volcano.pdf
heatmap.pdf
logs/ # analysis logs
temp/ # other temp files
Decision Tree
Step 0: Initialize Project
- Make director for this project:
Call:
mcp__project-init-tools__project_init
with:
sample: sample name (e.g. c1_vs_c2)task: TF_DB
The tool will:
- Create
${sample}_TF_DBdirectory. - Return the full path of the
${sample}_TF_DBdirectory, which will be used as${proj_dir}.
Step 1: Prepare Input Data
Create a CSV sample sheet (samplesheet.csv) with the following columns:
| SampleID | Tissue | Factor | Condition | bamReads | Peaks | PeakCaller |
|---|---|---|---|---|---|---|
| TF_A_1 | A | TF | Control | Control1.bam | Control1_peaks.narrowPeak | narrow |
| TF_A_2 | A | TF | Control | Control2.bam | Control2_peaks.narrowPeak | narrow |
| TF_B_1 | A | TF | Treated | Treated1.bam | Treated1_peaks.narrowPeak | narrow |
| TF_B_2 | A | TF | Treated | Treated2.bam | Treated2_peaks.narrowPeak | narrow |
Step 2: Load Data and Build the DiffBind Object
library(DiffBind)
samples <- read.csv("samplesheet.csv")
dbObj <- dba(sampleSheet=samples)
Key parameters:
sampleSheet: CSV file with BAM and peak information- Supports both narrowPeak and broadPeak formats
Step 3: Read Counting and Consensus Peak Generation
Count reads overlapping consensus peaks across samples:
# Generate a consensus peakset
dbObj <- dba.count(dbObj, summits=250)
Notes:
summits: re-centers peaks ±250 bp around summits for consistency.- The resulting matrix contains normalized counts for all samples.
Step 4: Contrast Definition
Define conditions for comparison:
# Define experimental contrasts (e.g., Treated vs Control)
dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION, minMembers=2)
Alternatives:
- For multifactor experiments: use
DBA_TISSUE,DBA_TREATMENT, or custom metadata. - Check contrasts:
dba.show(dbObj, bContrasts=TRUE)
Step 5: Differential Binding Analysis
# Perform analysis
dbObj <- dba.analyze(dbObj, method=DBA_DESEQ2)
Parameters:
method: chooseDBA_DESEQ2(default) orDBA_EDGERth: FDR threshold (default 0.05)fold: minimum log2 fold changebUsePval=TRUE: use p-values instead of FDR cutoff
Step 6: Visualization and Quality Control
Correlation Heatmap
dba.plotHeatmap(dbObj, correlations=TRUE, scale="row")
PCA Plot
dba.plotPCA(dbObj, attributes=DBA_CONDITION, label=DBA_ID)
Volcano Plot
# Volcano plot
allResults <- dba.report(dbObj, method=DBA_DESEQ2, th=1)
with(allResults, plot(Fold, -log10(FDR),
col=ifelse(FDR < 0.05 & abs(Fold) > 1, "red", "grey"),
pch=16, main="Volcano Plot"))
Output: heatmap.pdf Volcano.pdf PCA.pdf
Step 7: Result Extraction
Export significant differential peaks:
write.csv(as.data.frame(allResults), "DB_results.csv", row.names = FALSE)
library(rtracklayer)
# Extract results with FDR < 0.05 and |log2FC| > 1
sigSites <- dba.report(dbObj, method=DBA_DESEQ2, th=0.05, fold=1)
print("Differential binding results summary:")
print(summary(sigSites))
# get the peaks that up or down in treated condition
diff_up <- sigSites[sigSites$Fold > 0]
diff_down <- sigSites[sigSites$Fold < 0]
export(diff_up, "DB_up_${treated_condition}.bed")
export(diff_down, "DB_down_${treated_condition}.bed")
Output: DB_results.csv DB_up_${treated_condition}.bed DB_down_${treated_condition}.bed
Interpretation and Biological Insights
Significance Criteria
- FDR < 0.05 → statistically significant
- |log2FC| > 1 → biologically meaningful difference
- Consistent replicates → at least two replicates per condition recommended
Typical Biological Interpretations
- Increased binding in treated condition → potential activation or recruitment of TFs
- Decreased binding → loss of TF affinity or chromatin closing
- Combine with RNA-seq to correlate with target gene expression.
Troubleshooting
| Problem | Possible Cause | Solution |
|---|---|---|
| No differential peaks found | Insufficient replicates or low coverage | Increase sequencing depth or lower FDR threshold |
| Errors in sample sheet | Column names incorrect or missing | Use standard DiffBind column format |
| Inconsistent genome build | Mixed genome assemblies | Ensure all BAM and peak files use the same genome reference |
| Over-normalization | Strong batch effects | Include batch term in design or run dba.contrast(..., block=...) |