| name | identification-theory |
| description | DAG and potential outcomes frameworks for causal mediation identification |
Identification Theory
Comprehensive framework for causal identification in statistical methodology
Use this skill when working on: causal identification, mediation analysis identification, DAG-based reasoning, potential outcomes, identification assumptions, partial identification, sensitivity analysis, or deriving identification formulas.
Core Concepts
What is Identification?
A causal parameter $\psi$ is identified if it can be uniquely determined from the observed data distribution $P(O)$.
Formally: $\psi$ is identified if $P_1(O) = P_2(O) \Rightarrow \psi_1 = \psi_2$.
Why Identification Matters
Causal Question → Target Estimand → Identification → Estimation → Inference
↓ ↓ ↓ ↓ ↓
"Does A E[Y(1)-Y(0)] Express in Statistical Confidence
cause Y?" terms of P(O) methods intervals
Without identification, no amount of data can answer causal questions.
Two Frameworks
1. Potential Outcomes (Rubin/Neyman)
Primitives:
- $Y(a)$ = potential outcome under treatment $a$
- Only $Y = Y(A)$ is observed (consistency)
- Fundamental problem: never observe both $Y(0)$ and $Y(1)$ for same unit
Advantages:
- Clear definition of causal effects
- Natural for experimental reasoning
- Connects to missing data theory
2. Structural Causal Models (Pearl)
Primitives:
- Directed Acyclic Graph (DAG) encoding causal structure
- Structural equations: $Y := f_Y(PA_Y, U_Y)$
- Interventions via do-operator: $P(Y | do(A=a))$
Advantages:
- Visual representation of assumptions
- Systematic identification algorithms
- Clear separation of statistical and causal assumptions
DAG Framework
Directed Acyclic Graphs (DAGs)
A DAG $\mathcal{G} = (V, E)$ consists of:
- Vertices $V$: Random variables
- Directed edges $E$: Direct causal relationships
- Acyclic: No directed cycles
Key DAG Terminology
| Term | Definition | Notation |
|---|---|---|
| Parents | Direct causes | $PA_Y$ |
| Children | Direct effects | $CH_Y$ |
| Ancestors | All causes | $AN_Y$ |
| Descendants | All effects | $DE_Y$ |
| Collider | Node with two incoming arrows | $A \to C \leftarrow B$ |
| Mediator | Node on causal path | $A \to M \to Y$ |
| Confounder | Common cause | $A \leftarrow C \to Y$ |
# DAG specification and visualization using dagitty
library(dagitty)
# Define mediation DAG
mediation_dag <- dagitty('
dag {
A [exposure]
M [mediator]
Y [outcome]
X [confounder]
X -> A
X -> M
X -> Y
A -> M
A -> Y
M -> Y
}
')
# Visualize
plot(mediation_dag)
# Find adjustment sets
adjustmentSets(mediation_dag, exposure = "A", outcome = "Y")
# Check implied conditional independencies
impliedConditionalIndependencies(mediation_dag)
D-Separation
The Core Concept
Two nodes $A$ and $B$ are d-separated by set $Z$ if every path between them is blocked.
Path Blocking Rules
| Path Type | Blocked by conditioning on... |
|---|---|
| Chain: $A \to M \to B$ | $M$ (blocks) |
| Fork: $A \leftarrow C \to B$ | $C$ (blocks) |
| Collider: $A \to C \leftarrow B$ | NOT $C$ (conditioning opens!) |
D-separation Formula
$$A \perp!!!\perp_{\mathcal{G}} B \mid Z \iff \text{every path } A \text{---} B \text{ is blocked by } Z$$
# Check d-separation using dagitty
check_dseparation <- function(dag, x, y, z = NULL) {
if (is.null(z)) {
dseparated(dag, x, y)
} else {
dseparated(dag, x, y, z)
}
}
# Find all d-separating sets
find_dsep_sets <- function(dag, x, y) {
# All adjustment sets that d-separate x and y
adjustmentSets(dag, exposure = x, outcome = y, effect = "total")
}
# Verify conditional independence implications
verify_ci_implications <- function(dag, data) {
implied_ci <- impliedConditionalIndependencies(dag)
results <- lapply(implied_ci, function(ci) {
# Parse the CI statement
vars <- strsplit(as.character(ci), " _\\|\\|_ | \\| ")[[1]]
x <- vars[1]
y <- vars[2]
z <- if (length(vars) > 2) vars[3:length(vars)] else NULL
# Test with partial correlation or conditional independence test
test_result <- test_conditional_independence(data, x, y, z)
list(statement = as.character(ci), p_value = test_result$p.value)
})
do.call(rbind, lapply(results, as.data.frame))
}
Backdoor Criterion
Definition
A set $Z$ satisfies the backdoor criterion relative to $(A, Y)$ if:
- No node in $Z$ is a descendant of $A$
- $Z$ blocks every path between $A$ and $Y$ that contains an arrow into $A$
Backdoor Adjustment Formula
If $Z$ satisfies the backdoor criterion: $$P(Y | do(A = a)) = \sum_z P(Y | A = a, Z = z) P(Z = z)$$
or equivalently: $$E[Y(a)] = E_Z[E[Y | A = a, Z]]$$
Front-Door Criterion
When backdoor fails but mediator is unconfounded: $$P(Y | do(A)) = \sum_m P(M = m | A) \sum_{a'} P(Y | M = m, A = a') P(A = a')$$
# Check backdoor criterion
check_backdoor <- function(dag, exposure, outcome, adjustment_set) {
# Using dagitty
valid_sets <- adjustmentSets(dag, exposure = exposure,
outcome = outcome, type = "minimal")
# Check if proposed set is valid
is_valid <- any(sapply(valid_sets, function(s) {
setequal(s, adjustment_set)
}))
list(
is_valid = is_valid,
minimal_sets = valid_sets,
proposed = adjustment_set
)
}
# Compute backdoor-adjusted estimate
backdoor_adjustment <- function(data, outcome, exposure, adjustment) {
formula_str <- paste(outcome, "~", exposure, "+",
paste(adjustment, collapse = " + "))
model <- lm(as.formula(formula_str), data = data)
# Standardization
predictions_a1 <- predict(model,
newdata = transform(data, setNames(list(1), exposure)))
predictions_a0 <- predict(model,
newdata = transform(data, setNames(list(0), exposure)))
list(
ate = mean(predictions_a1 - predictions_a0),
se = sqrt(var(predictions_a1 - predictions_a0) / nrow(data))
)
}
# Full identification analysis
analyze_identification <- function(dag, exposure, outcome) {
list(
adjustment_sets = adjustmentSets(dag, exposure, outcome),
instrumental_sets = instrumentalVariables(dag, exposure, outcome),
direct_effects = adjustmentSets(dag, exposure, outcome, effect = "direct"),
implied_independencies = impliedConditionalIndependencies(dag)
)
}
Framework Equivalence
For most problems, both frameworks give equivalent results: $$E[Y(a)] = E[Y | do(A=a)]$$
Choose based on context and audience.
Key Identification Assumptions
For Treatment Effects
| Assumption | Formal Statement | Interpretation |
|---|---|---|
| Consistency | $Y = Y(A)$ | Observed outcome equals potential outcome for received treatment |
| Positivity | $P(A=a \mid X=x) > 0$ for all $x$ with $P(X=x) > 0$ | Every covariate stratum has both treated and untreated |
| Exchangeability | $Y(a) \perp!!!\perp A \mid X$ | No unmeasured confounding given $X$ |
| SUTVA | No interference, single version of treatment | Units don't affect each other |
For Mediation Effects
Additional assumptions required:
| Assumption | Formal Statement | Interpretation |
|---|---|---|
| Cross-world exchangeability | $Y(a,m) \perp!!!\perp M(a^*) \mid X$ | Counterfactual mediator independent of counterfactual outcome |
| No $A$-$M$ interaction (optional) | $Y(a,m) - Y(a',m)$ constant in $m$ | Simplifies identification |
| Compositional | $Y(a) = Y(a, M(a))$ | Potential outcome composition |
Standard Identification Results
1. Average Treatment Effect (ATE)
Target: $\psi = E[Y(1) - Y(0)]$
Under exchangeability (A1), consistency (A2), positivity (A3):
$$\psi = E\left[E[Y | A=1, X] - E[Y | A=0, X]\right]$$
Proof sketch: \begin{align} E[Y(a)] &= E[E[Y(a) | X]] && \text{(iterated expectations)} \ &= E[E[Y(a) | A=a, X]] && \text{(A1: exchangeability)} \ &= E[E[Y | A=a, X]] && \text{(A2: consistency)} \end{align}
2. Average Treatment Effect on Treated (ATT)
Target: $\psi_{ATT} = E[Y(1) - Y(0) | A=1]$
Under weaker exchangeability $Y(0) \perp!!!\perp A \mid X$:
$$\psi_{ATT} = E\left[E[Y | A=1, X] - E[Y | A=0, X] \mid A=1\right]$$
3. Natural Direct and Indirect Effects (Mediation)
Target:
- NDE: $E[Y(1, M(0)) - Y(0, M(0))]$
- NIE: $E[Y(1, M(1)) - Y(1, M(0))]$
Under mediation assumptions (see VanderWeele, 2015):
$$NDE = \int\int {E[Y|A=1,M=m,X=x] - E[Y|A=0,M=m,X=x]} , dP(m|A=0,X=x) , dP(x)$$
$$NIE = \int\int E[Y|A=1,M=m,X=x] {dP(m|A=1,X=x) - dP(m|A=0,X=x)} , dP(x)$$
4. Controlled Direct Effect (CDE)
Target: $CDE(m) = E[Y(1,m) - Y(0,m)]$
Simpler identification (no cross-world assumption):
$$CDE(m) = E[E[Y|A=1,M=m,X] - E[Y|A=0,M=m,X]]$$
DAG-Based Identification
The Back-Door Criterion
A set $X$ satisfies the back-door criterion relative to $(A, Y)$ if:
- No node in $X$ is a descendant of $A$
- $X$ blocks every path between $A$ and $Y$ that contains an arrow into $A$
If satisfied: $$P(Y | do(A=a)) = \sum_x P(Y | A=a, X=x) P(X=x)$$
The Front-Door Criterion
When there's an unmeasured confounder $U$ between $A$ and $Y$, but $M$ mediates all of $A$'s effect:
U
/ \
↓ ↓
A → M → Y
Identification: $$P(Y | do(A=a)) = \sum_m P(M=m | A=a) \sum_{a'} P(Y | M=m, A=a') P(A=a')$$
Instrumental Variables
When $Z$ affects $Y$ only through $A$:
U
↓
Z → A → Y
Local ATE identification (with monotonicity): $$LATE = \frac{E[Y | Z=1] - E[Y | Z=0]}{E[A | Z=1] - E[A | Z=0]}$$
Sequential Identification (Multiple Mediators)
Sequential Mediation (A → M1 → M2 → Y)
Product of three path identification requires:
- Standard confounding control for each arrow
- No intermediate confounders affected by treatment
- Sequential ignorability assumptions
Path-specific effects:
- Direct: $A \to Y$
- Through $M_1$ only: $A \to M_1 \to Y$
- Through $M_2$ only: $A \to M_2 \to Y$
- Through both: $A \to M_1 \to M_2 \to Y$
Identification Formula (No Intermediate Confounding)
$$\text{Effect through } M_1 \to M_2 = \int E\left[\frac{\partial^3}{\partial a \partial m_1 \partial m_2} E[Y|A,M_1,M_2,X]\right]$$
Expressed as product of coefficients: $\hat{\alpha}_1 \cdot \hat{\beta}_1 \cdot \hat{\gamma}_2$
Partial Identification
When point identification fails, we can still bound the parameter.
Manski Bounds (No Assumptions)
For ATE with missing outcomes: $$E[Y(1)] \in [E[Y \cdot A]/P(A=1) + y_{min}P(A=0), E[Y \cdot A]/P(A=1) + y_{max}P(A=0)]$$
Sensitivity Analysis
When exchangeability is uncertain, parameterize violation:
Unmeasured confounding parameter $\Gamma$: $$\frac{1}{\Gamma} \leq \frac{P(A=1|X,U=1)/P(A=0|X,U=1)}{P(A=1|X,U=0)/P(A=0|X,U=0)} \leq \Gamma$$
Compute bounds as function of $\Gamma$ (Rosenbaum bounds).
E-Value
Minimum strength of unmeasured confounding (on risk ratio scale) needed to explain away observed effect:
$$E\text{-value} = RR + \sqrt{RR \times (RR-1)}$$
Identification Strategies by Design
Randomized Controlled Trials (RCTs)
- Treatment assignment random → exchangeability holds by design
- Still need SUTVA, consistency
- For mediation: randomize $M$ as well, or use sequential ignorability
Observational Studies
| Strategy | Key Assumption | Best For |
|---|---|---|
| Regression adjustment | All confounders measured | Rich covariate data |
| Propensity score | Correct PS model | High-dimensional confounders |
| Instrumental variables | Valid instrument exists | Unmeasured confounding |
| Regression discontinuity | Continuity at threshold | Sharp treatment rules |
| Difference-in-differences | Parallel trends | Panel data |
Natural Experiments
- Exploit exogenous variation (policy changes, geographic variation)
- Requires careful argument for why variation is "as-if random"
Identification in the MediationVerse
medfit: Foundation
- Implements standard mediation identification
- VanderWeele regression-based approach
- Supports binary/continuous treatments and mediators
probmed: Effect Size
- $P_M$ identification requires identified NDE/NIE
- Handles case when NDE and NIE have opposite signs
RMediation: Confidence Intervals
- Takes identified effects as input
- Distribution of product of coefficients (PRODCLIN)
- Monte Carlo intervals
medrobust: Sensitivity
- When identification assumptions are uncertain
- Bounds on effects under confounding
- E-values for unmeasured confounding
medsim: Validation
- Simulate data where truth is known
- Verify identification formulas recover true effects
- Test estimator properties
Identification Proof Template
\begin{theorem}[Identification of $\psi$]
Under Assumptions:
\begin{enumerate}[label=A\arabic*.]
\item (Consistency) $Y = Y(A)$, $M = M(A)$
\item (Positivity) $P(A=a|X) > \epsilon > 0$ for all $a \in \mathcal{A}$
\item (Exchangeability) $Y(a) \perp\!\!\!\perp A \mid X$
\end{enumerate}
the causal estimand $\psi = E[g(Y(a))]$ is identified by
\[
\psi = E_X\left[E[g(Y) \mid A=a, X]\right].
\]
\end{theorem}
\begin{proof}
\begin{align}
E[g(Y(a))] &= E\left[E[g(Y(a)) \mid X]\right]
&& \text{(law of total expectation)} \\
&= E\left[E[g(Y(a)) \mid A=a, X]\right]
&& \text{(by A3: exchangeability)} \\
&= E\left[E[g(Y) \mid A=a, X]\right]
&& \text{(by A1: consistency)}
\end{align}
The RHS depends only on the observed data distribution $P(Y,A,X)$.
\end{proof}
Common Identification Pitfalls
1. Conditioning on Colliders
A → C ← Y
Conditioning on $C$ opens a path between $A$ and $Y$.
2. Conditioning on Mediators
A → M → Y
Conditioning on $M$ blocks the indirect effect, doesn't control confounding.
3. Overcontrol Bias
Conditioning on descendants of treatment can bias estimates.
4. M-Bias
U1 → X ← U2
↓ ↓
A ——————→ Y
Conditioning on $X$ opens path $A \leftarrow U_1 \rightarrow X \leftarrow U_2 \rightarrow Y$.
5. Table 2 Fallacy
Interpreting coefficients causally when model includes intermediate variables.
Verification Questions
When reviewing identification arguments, ask:
- Is the target estimand clearly defined?
- Are all assumptions explicitly stated?
- Is each step in the derivation justified?
- Are the assumptions plausible in this context?
- What if an assumption is violated?
- Is there a DAG that encodes the assumptions?
- Are there alternative identification strategies?
Integration with Other Skills
This skill works with:
- proof-architect - For writing identification proofs
- asymptotic-theory - For inference after identification
- methods-paper-writer - For presenting identification in manuscripts
- simulation-architect - For validating identification
Key References
Imai
Hernan
Pearl, J. (2009). Causality: Models, Reasoning, and Inference (2nd ed.)
VanderWeele, T.J. (2015). Explanation in Causal Inference
Hernán, M.A. & Robins, J.M. (2020). Causal Inference: What If
Imbens, G.W. & Rubin, D.B. (2015). Causal Inference for Statistics
Version: 1.0 Created: 2025-12-08 Domain: Causal Inference, Mediation Analysis