name

TF-differential-binding

description

The TF-differential-binding pipeline performs differential transcription factor (TF) binding analysis from ChIP-seq datasets (TF peaks) using the DiffBind package in R. It identifies genomic regions where TF binding intensity significantly differs between experimental conditions (e.g., treatment vs. control, mutant vs. wild-type). Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.

DiffBind TF Differential Binding Analysis

Overview

This skill enables comprehensive differential TF binding analysis using DiffBind in R. DiffBind integrates read counting, normalization, and statistical modeling to identify differentially bound peaks between conditions.

To perform DiffBind differential binding analysis:

Initialize the project directory.
Refer to the Inputs & Outputs section to check inputs and build the output architecture. All the output file should located in ${proj_dir} in Step 0.
Always prompt user if required files are missing.
Provide a sample sheet with ChIP-seq peak files and corresponding BAM files for each sample.
Construct a DBA object from the sample sheet.
Compute read counts over consensus peak regions.
Specify experimental conditions (e.g., treatment vs. control or cell_type_A vs. cell_type_B).
Run statistical tests to identify differentially bound regions.
Generate correlation heatmaps, PCA plots, and volcano plots; extract significant binding events.

When to use this skill

Use the TF-differential-binding pipeline when you need to analyze the different function of the same TF across two or more biological conditions, cell types, or treatments using ChIP-seq data or TF binding peaks. This pipeline is ideal for studying regulatory mechanisms that underlie transcriptional differences or epigenetic responses to perturbations.

Recommended applications include:

Comparing treated vs. control or wild-type vs. mutant conditions to identify TF binding changes in response to stimuli, drugs, or mutations.
Comparing TF binding profiles between two cell types or experimental conditions to identify differentially bound regions (DBRs).
Comparing the different TF function in two conditions.
Integrating with RNA-seq to correlate TF binding alterations with gene expression changes.
Investigating co-factor dependencies or chromatin remodeling events linked to TF occupancy.

Inputs & Outputs

Inputs (choose one)

If starting from BAM files and BED peak files → Generate consensus peaks and count matrix.
If starting from existing count matrix → Go directly to DiffBind analysis.
If multiple conditions or batches → Include batch/condition in design

Outputs

${sample}_TF_DB_analysis/
    DBs/
      DB_results.csv # DESeq2 results (log2FC, p-values)
      DB_up.bed
      DB_down.bed  
    plots/ # visualization outputs
      PCA.pdf
      volcano.pdf
      heatmap.pdf
    logs/ # analysis logs 
    temp/ # other temp files

Decision Tree

Step 0: Initialize Project

Make director for this project:

Call:

mcp__project-init-tools__project_init

with:

sample: sample name (e.g. c1_vs_c2)
task: TF_DB

The tool will:

Create ${sample}_TF_DB directory.
Return the full path of the ${sample}_TF_DB directory, which will be used as ${proj_dir}.

Step 1: Prepare Input Data

Create a CSV sample sheet (samplesheet.csv) with the following columns:

SampleID	Tissue	Factor	Condition	bamReads	Peaks	PeakCaller
TF_A_1	A	TF	Control	Control1.bam	Control1_peaks.narrowPeak	narrow
TF_A_2	A	TF	Control	Control2.bam	Control2_peaks.narrowPeak	narrow
TF_B_1	A	TF	Treated	Treated1.bam	Treated1_peaks.narrowPeak	narrow
TF_B_2	A	TF	Treated	Treated2.bam	Treated2_peaks.narrowPeak	narrow

Step 2: Load Data and Build the DiffBind Object

library(DiffBind)
samples <- read.csv("samplesheet.csv")
dbObj <- dba(sampleSheet=samples)

Key parameters:

sampleSheet: CSV file with BAM and peak information
Supports both narrowPeak and broadPeak formats

Step 3: Read Counting and Consensus Peak Generation

Count reads overlapping consensus peaks across samples:

# Generate a consensus peakset
dbObj <- dba.count(dbObj, summits=250)

Notes:

summits: re-centers peaks ±250 bp around summits for consistency.
The resulting matrix contains normalized counts for all samples.

Step 4: Contrast Definition

Define conditions for comparison:

# Define experimental contrasts (e.g., Treated vs Control)
dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION, minMembers=2)

Alternatives:

For multifactor experiments: use DBA_TISSUE, DBA_TREATMENT, or custom metadata.
Check contrasts:
```
dba.show(dbObj, bContrasts=TRUE)
```

Step 5: Differential Binding Analysis

# Perform analysis
dbObj <- dba.analyze(dbObj, method=DBA_DESEQ2)

Parameters:

method: choose DBA_DESEQ2 (default) or DBA_EDGER
th: FDR threshold (default 0.05)
fold: minimum log2 fold change
bUsePval=TRUE: use p-values instead of FDR cutoff

Step 6: Visualization and Quality Control

Correlation Heatmap

dba.plotHeatmap(dbObj, correlations=TRUE, scale="row")

PCA Plot

dba.plotPCA(dbObj, attributes=DBA_CONDITION, label=DBA_ID)

Volcano Plot

# Volcano plot
allResults <- dba.report(dbObj, method=DBA_DESEQ2, th=1)
with(allResults, plot(Fold, -log10(FDR),
     col=ifelse(FDR < 0.05 & abs(Fold) > 1, "red", "grey"),
     pch=16, main="Volcano Plot"))

Output: heatmap.pdf Volcano.pdf PCA.pdf

Step 7: Result Extraction

Export significant differential peaks:

write.csv(as.data.frame(allResults), "DB_results.csv", row.names = FALSE)
library(rtracklayer)
# Extract results with FDR < 0.05 and |log2FC| > 1
sigSites <- dba.report(dbObj, method=DBA_DESEQ2, th=0.05, fold=1)
print("Differential binding results summary:")
print(summary(sigSites))

# get the peaks that up or down in treated condition
diff_up <- sigSites[sigSites$Fold > 0]
diff_down <- sigSites[sigSites$Fold < 0]
export(diff_up, "DB_up_${treated_condition}.bed")
export(diff_down, "DB_down_${treated_condition}.bed")

Output: DB_results.csv DB_up_${treated_condition}.bed DB_down_${treated_condition}.bed

Interpretation and Biological Insights

Significance Criteria

FDR < 0.05 → statistically significant
|log2FC| > 1 → biologically meaningful difference
Consistent replicates → at least two replicates per condition recommended

Typical Biological Interpretations

Increased binding in treated condition → potential activation or recruitment of TFs
Decreased binding → loss of TF affinity or chromatin closing
Combine with RNA-seq to correlate with target gene expression.

Troubleshooting

Problem	Possible Cause	Solution
No differential peaks found	Insufficient replicates or low coverage	Increase sequencing depth or lower FDR threshold
Errors in sample sheet	Column names incorrect or missing	Use standard DiffBind column format
Inconsistent genome build	Mixed genome assemblies	Ensure all BAM and peak files use the same genome reference
Over-normalization	Strong batch effects	Include batch term in design or run `dba.contrast(..., block=...)`