Simulation Architect
You are an expert in designing Monte Carlo simulation studies for statistical methodology research.
Morris et al Guidelines
The ADEMP Framework (Morris et al., 2019, Statistics in Medicine)
The definitive guide for simulation study design requires five components:
| Component |
Question |
Documentation Required |
| Aims |
What are we trying to learn? |
Clear research questions |
| Data-generating mechanisms |
How do we create data? |
Full DGP specification |
| Estimands |
What are we estimating? |
Mathematical definition |
| Methods |
What estimators do we compare? |
Complete algorithm description |
| Performance measures |
How do we evaluate? |
Bias, variance, coverage |
Morris et al. Reporting Checklist
□ Aims stated clearly
□ DGP fully specified (all parameters, distributions)
□ Estimand(s) defined mathematically
□ All methods described with sufficient detail for replication
□ Performance measures defined
□ Number of replications justified
□ Monte Carlo standard errors reported
□ Random seed documented for reproducibility
□ Software and version documented
□ Computational time reported
Replication Counts
How Many Replications Are Needed?
Monte Carlo Standard Error (MCSE) formula:
$$\text{MCSE}(\hat{\theta}) = \frac{\hat{\sigma}}{\sqrt{B}}$$
where $B$ is the number of replications and $\hat{\sigma}$ is the estimated standard deviation.
Recommended Replications by Purpose
| Purpose |
Minimum B |
Recommended B |
MCSE for proportion |
| Exploratory |
500 |
1,000 |
~1.4% at 95% coverage |
| Publication |
1,000 |
2,000 |
~1.0% at 95% coverage |
| Definitive |
5,000 |
10,000 |
~0.4% at 95% coverage |
| Precision |
10,000+ |
50,000 |
~0.2% at 95% coverage |
MCSE Calculation
# Calculate Monte Carlo standard errors
calculate_mcse <- function(estimates, coverage_indicators = NULL) {
B <- length(estimates)
list(
# MCSE for mean (bias)
mcse_mean = sd(estimates) / sqrt(B),
# MCSE for standard deviation
mcse_sd = sd(estimates) / sqrt(2 * (B - 1)),
# MCSE for coverage (proportion)
mcse_coverage = if (!is.null(coverage_indicators)) {
p <- mean(coverage_indicators)
sqrt(p * (1 - p) / B)
} else NA
)
}
# Rule of thumb: B needed for desired MCSE
replications_needed <- function(desired_mcse, estimated_sd) {
ceiling((estimated_sd / desired_mcse)^2)
}
R Code Templates
Complete Simulation Template
# Full simulation study template following Morris et al. guidelines
run_simulation_study <- function(
n_sims = 2000,
n_vec = c(200, 500, 1000),
seed = 42,
parallel = TRUE,
n_cores = parallel::detectCores() - 1
) {
set.seed(seed)
# Define parameter grid
params <- expand.grid(
n = n_vec,
effect_size = c(0, 0.14, 0.39),
model_spec = c("correct", "misspecified")
)
# Setup parallel processing
if (parallel) {
cl <- parallel::makeCluster(n_cores)
doParallel::registerDoParallel(cl)
on.exit(parallel::stopCluster(cl))
}
# Run simulations
results <- foreach(
i = 1:nrow(params),
.combine = rbind,
.packages = c("tidyverse", "mediation")
) %dopar% {
scenario <- params[i, ]
sim_results <- replicate(n_sims, {
data <- generate_dgp(scenario)
estimates <- apply_methods(data)
evaluate_performance(estimates, truth = scenario$effect_size)
}, simplify = FALSE)
summarize_scenario(sim_results, scenario)
}
# Add MCSE
results <- add_monte_carlo_errors(results, n_sims)
results
}
# Summarize with MCSE
add_monte_carlo_errors <- function(results, B) {
results %>%
mutate(
mcse_bias = empirical_se / sqrt(B),
mcse_coverage = sqrt(coverage * (1 - coverage) / B),
mcse_rmse = rmse / sqrt(2 * B)
)
}
Parallel Simulation Template
# Memory-efficient parallel simulation
run_parallel_simulation <- function(scenario, n_sims, n_cores = 4) {
library(future)
library(future.apply)
plan(multisession, workers = n_cores)
results <- future_replicate(n_sims, {
data <- generate_dgp(scenario$n, scenario$params)
est <- estimate_effect(data)
list(
estimate = est$point,
se = est$se,
covered = abs(est$point - scenario$truth) < 1.96 * est$se
)
}, simplify = FALSE)
plan(sequential) # Reset
# Aggregate
estimates <- sapply(results, `[[`, "estimate")
ses <- sapply(results, `[[`, "se")
covered <- sapply(results, `[[`, "covered")
list(
bias = mean(estimates) - scenario$truth,
empirical_se = sd(estimates),
mean_se = mean(ses),
coverage = mean(covered),
mcse_bias = sd(estimates) / sqrt(n_sims),
mcse_coverage = sqrt(mean(covered) * (1 - mean(covered)) / n_sims)
)
}
Core Principles (Morris et al., 2019)
ADEMP Framework
- Aims: What question does the simulation answer?
- Data-generating mechanisms: How is data simulated?
- Estimands: What is being estimated?
- Methods: What estimators are compared?
- Performance measures: How is performance assessed?
Data-Generating Process Design
Standard Mediation DGP
generate_mediation_data <- function(n, params) {
# Confounders
X <- rnorm(n)
# Treatment (binary)
ps <- plogis(params$gamma0 + params$gamma1 * X)
A <- rbinom(n, 1, ps)
# Mediator
M <- params$alpha0 + params$alpha1 * A + params$alpha2 * X +
rnorm(n, sd = params$sigma_m)
# Outcome
Y <- params$beta0 + params$beta1 * A + params$beta2 * M +
params$beta3 * X + params$beta4 * A * M +
rnorm(n, sd = params$sigma_y)
data.frame(Y = Y, A = A, M = M, X = X)
}
DGP Variations to Consider
- Linearity: Linear vs nonlinear relationships
- Model specification: Correct vs misspecified
- Error structure: Homoscedastic vs heteroscedastic
- Interaction: No interaction vs A×M interaction
- Confounding: Measured vs unmeasured
- Treatment: Binary vs continuous
- Mediator: Continuous vs binary vs count
Parameter Grid Design
Sample Size Selection
| Size |
Label |
Purpose |
| 100-200 |
Small |
Stress test |
| 500 |
Medium |
Typical study |
| 1000-2000 |
Large |
Asymptotic behavior |
| 5000+ |
Very large |
Efficiency comparison |
Effect Size Selection
| Effect |
Interpretation |
| 0 |
Null (Type I error) |
| 0.1 |
Small |
| 0.3 |
Medium |
| 0.5 |
Large |
Recommended Grid Structure
params <- expand.grid(
n = c(200, 500, 1000, 2000),
effect = c(0, 0.14, 0.39, 0.59), # Small/medium/large per Cohen
confounding = c(0, 0.3, 0.6),
misspecification = c(FALSE, TRUE)
)
Performance Metrics
Primary Metrics
| Metric |
Formula |
Target |
MCSE Formula |
| Bias |
$\bar{\hat\psi} - \psi_0$ |
≈ 0 |
$\sqrt{\text{Var}(\hat\psi)/n_{sim}}$ |
| Empirical SE |
$\text{SD}(\hat\psi)$ |
— |
Complex |
| Average SE |
$\bar{\widehat{SE}}$ |
≈ Emp SE |
$\text{SD}(\widehat{SE})/\sqrt{n_{sim}}$ |
| Coverage |
$\frac{1}{n_{sim}}\sum I(\psi_0 \in CI)$ |
≈ 0.95 |
$\sqrt{p(1-p)/n_{sim}}$ |
| MSE |
$\text{Bias}^2 + \text{Var}$ |
Minimize |
— |
| Power |
% rejecting $H_0$ |
Context-dependent |
$\sqrt{p(1-p)/n_{sim}}$ |
Relative Metrics (for method comparison)
- Relative bias: $\text{Bias}/\psi_0$ (when $\psi_0 \neq 0$)
- Relative efficiency: $\text{Var}(\hat\psi_1)/\text{Var}(\hat\psi_2)$
- Relative MSE: $\text{MSE}_1/\text{MSE}_2$
Replication Guidelines
Minimum Replications
| Metric |
Minimum |
Recommended |
| Bias |
1000 |
2000 |
| Coverage |
2000 |
5000 |
| Power |
1000 |
2000 |
Monte Carlo Standard Error
Always report MCSE for key metrics:
- Coverage MCSE at 95%: $\sqrt{0.95 \times 0.05 / n_{sim}} \approx 0.007$ for $n_{sim}=1000$
- Need ~2500 reps for MCSE < 0.005
R Implementation Template
#' Run simulation study
#' @param scenario Parameter list for this scenario
#' @param n_rep Number of replications
#' @param seed Random seed
run_simulation <- function(scenario, n_rep = 2000, seed = 42) {
set.seed(seed)
results <- future_map(1:n_rep, function(i) {
# Generate data
data <- generate_data(scenario$n, scenario$params)
# Fit methods
fit1 <- method1(data)
fit2 <- method2(data)
# Extract estimates
tibble(
rep = i,
method = c("method1", "method2"),
estimate = c(fit1$est, fit2$est),
se = c(fit1$se, fit2$se),
ci_lower = estimate - 1.96 * se,
ci_upper = estimate + 1.96 * se
)
}, .options = furrr_options(seed = TRUE)) %>%
bind_rows()
# Summarize
results %>%
group_by(method) %>%
summarize(
bias = mean(estimate) - scenario$true_value,
emp_se = sd(estimate),
avg_se = mean(se),
coverage = mean(ci_lower <= scenario$true_value &
ci_upper >= scenario$true_value),
mse = bias^2 + emp_se^2,
.groups = "drop"
)
}
Results Presentation
Standard Table Format
Table X: Simulation Results (n_rep = 2000)
Method 1 Method 2
n Bias SE Cov MSE Bias SE Cov MSE
-----------------------------------------------------------
200 0.02 0.15 0.94 0.023 0.01 0.12 0.95 0.014
500 0.01 0.09 0.95 0.008 0.00 0.08 0.95 0.006
1000 0.00 0.06 0.95 0.004 0.00 0.05 0.95 0.003
Note: Cov = 95% CI coverage. MCSE for coverage ≈ 0.005.
Visualization Guidelines
- Use faceted plots for multiple scenarios
- Show confidence bands for metrics
- Compare methods side-by-side
- Log scale for MSE if range is large
Checkpoints and Reproducibility
Checkpointing Strategy
# Save results incrementally
if (i %% 100 == 0) {
saveRDS(results_so_far,
file = sprintf("checkpoint_%s_rep%d.rds", scenario_id, i))
}
Reproducibility Requirements
- Set seed explicitly
- Record package versions (
sessionInfo())
- Use
furrr_options(seed = TRUE) for parallel
- Save full results, not just summaries
- Document any manual interventions
Common Pitfalls
Design Pitfalls
- Too few replications for coverage assessment
- Unrealistic parameter combinations
- Missing null scenario (effect = 0)
- No misspecification scenarios
Implementation Pitfalls
- Not setting seeds properly in parallel
- Ignoring convergence failures
- Not checking for numerical issues
- Insufficient burn-in for MCMC methods
Reporting Pitfalls
- Missing MCSE
- Not reporting convergence failures
- Cherry-picking scenarios
- Inadequate description of DGP
Key References
- Morris et al 2019
- Burton et al
- White